HIDDEN MARKOV PROCESSES IN THE CONTEXT OF 
SYMBOLIC DYNAMICS 



MIKE BOYLE AND KARL PETERSEN 



Abstract. In an effort to aid communication among different fields and per- 
haps facilitate progress on problems common to all of them, this article dis- 
cusses hidden Markov processes from several viewpoints, especially that of 
symbolic dynamics, where they are known as sofic measures, or continuous 
shift-commuting images of Markov measures. It provides background, de- 
scribes known tools and methods, surveys some of the literature, and proposes 
several open problems. 



Contents 

1. Introduction 3 

2. Subshift background 3 

2.1. Subshifts 3 

2.2. Sliding block codes 5 

2.3. Measures 6 

2.4. Hidden Markov (sofic) measures 6 

3. Factor maps and thcrmodynamical concepts 9 

3.1. Markovian and non-Markovian maps 9 

3.2. Thermodynamics on subshifts 001 11 

3.3. Compensation functions 12 

3.4. Relative pressure 14 

3.5. Measures of maximal and relatively maximal entropy 15 

3.6. Finite-to-one codes 18 

3.7. The semigroup measures of Kitchens and Tuncel 18 

4. Identification of hidden Markov measures 20 

4.1. Formal series and formal languages 21 



Received by the editors January 13, 2010. 

2010 Mathematics Subject Classification. Primary: 60K99, 60-02, 37-02; Secondary: 37B10, 
60J10, 37D35, 94A15. 



2 



MIKE BOYLE AND KARL PETERSEN 



4.1.1. Basic definitions 21 

4.1.2. Rational series and languages 23 

4.1.3. Distance and topology in J- (A) 24 

4.1.4. Recognizable (linearly rcpresentable) series 24 

4.2. Equivalent characterizations of hidden Markov measures 28 

4.2.1. Sofic measures — formal series approach 28 

4.2.2. Proof that a series is linearly representable if and only 

if it is a member of a stable finitely generated 

submodule of F(A) 29 

4.2.3. Proof that a formal series is linearly representable if 

and only if it is rational 30 

4.2.4. Linearly representable series correspond to sofic 

measures 33 

4.3. Sofic measures — Furstenberg's approach 34 

4.4. Sofic measures — Heller's approach 37 

4.4.1. Stochastic module 37 

4.4.2. The reduced stochastic module 38 

4.4.3. Heller's answer to Problem 4.34 38 

4.5. Linear automata and the reduced stochastic module for a 

finitary measure 39 

4.6. Topological factors of finitary measures, and Nasu's core 

matrix 41 

5. When is a sofic measure Markov? 42 

5.1. When is the image of a 1-step Markov measure under a 

1-block map 1-step Markov? 42 

5.1.1. Stochastic module answer 43 

5.1.2. Linear algebra answer 43 

5.2. Orders of Markov measures under codes 46 

6. Resolving maps and Markovian maps 49 

6.1. Resolving maps 49 

6.2. All factor maps lift 1-1 a.c. to Markovian maps 50 

6.3. Every factor map between SFT's is hidden Markovian 51 

References 53 



SYMBOLIC DYNAMICS VIEWPOINT 



3 



1. Introduction 



Symbolic dynamics is the study of shift (and other) transformations on spaces 
of infinite sequences or arrays of symbols and maps between such systems. A sym- 
bolic dynamical system, with a shift-invariant measure, corresponds to a stationary 
stochastic process. In the setting of information theory, such a system amounts 
to a collection of messages. Markov measures and hidden Markov measures, also 
called sofic measures, on symbolic dynamical systems have the desirable property 
of being determined by a finite set of data. But not all of their properties, for 
example the entropy, can be determined by finite algorithms. This article surveys 
some of the known and unknown properties of hidden Markov measures that are of 
special interest from the viewpoint of symbolic dynamics. To keep the article self 
contained, necessary background and related concepts are reviewed briefly. More 
can be found in [66, 78, 77, 96]. 

We discuss methods and tools that have been useful in the study of symbolic sys- 
tems, measures supported on them, and maps between them. Throughout we state 
several problems that we believe to be open and meaningful for further progress. 
We review a swath of the complicated literature starting around 1960 that deals 
with the problem of recognizing hidden Markov measures, as closely related ideas 
were repeatedly rediscovered in varying settings and with varying degrees of gener- 
ality or practicality. Our focus is on the probability papers that relate most closely 
to symbolic dynamics. We have left out much of the literature concerning proba- 
bilistic and linear automata and control, but we have tried to include the main ideas 
relevant to our problems. Some of the explanations that we give and connections 
that we draw are new, as are some results near the end of the article. In Section 

5.2 we give bounds on the possible order (memory) if a given sofic measure is in 
fact a Markov measure, with the consequence that in some situations there is an 
algorithm for determining whether a hidden Markov measure is Markov. In Section 

6.3 we show that every factor map is hidden Markovian, in the sense that every 
hidden Markov measure on an irreducible sofic subshift lifts to a fully supported 
hidden Markov measure. 



2.1. Subshifts. Let A be a set, usually finite or sometimes countable, which we 
consider to be an alphabet of symbols. 



denotes the set of all finite blocks or words with entries from A, including the empty 
word, e; A + denotes the set of all nonempty words in A*; Z denotes the integers and 
Z + denotes the nonnegative integers. Let Sl(A) = A z and £L + (A) = A z+ denote 
the set of all two or one-sided sequences with entries from A. If A = {0, 1, . . . , d— 1} 
for some integer d > 1, we denote il(A) by fid and Q + (A) by Each of these 
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spaces is a metric space with respect to the metric defined by setting for x ^ y 

(2.2) k(x,y) = mia{\j\:x j ^y j } and d(x,y) = e -*<**>. 

For i < j and x £ Q(A) we denote by x[i,j] the block or word XiX^+i . . . Xj. If 
u = loq . . . w n _i is a block of length n, we define 

(2.3) CoH={y6nW:y[0,n-l]=4 
and, for i £ Z, 

(2.4) Ci(w) = G : y[i, i + n - 1] = u}. 

The cylinder sets Ci(ui), u £ .A*, i £ Z, are open and closed and form a base for the 
topology of Q(A). 

In this paper, a topological dynamical system is a continuous self map of a 
compact metrizable space. The shift transformation a : 0^ — > fid is defined by 
(crx), = Xi+i for all i. On the maps a and <7 _1 arc one-to-one, onto, and con- 
tinuous. The pair (fid, o~) forms a topological dynamical system which is called the 
full d-shift. 

If X is a closed er-invariant subset of Qd, then the topological dynamical system 
(X, a) is called a subshift. In this paper, with "cr-invariant" we include the require- 
ment that the restriction of the shift be surjective. Sometimes we denote a subshift 
(X,a) by only X, the shift map being understood implicitly. When dealing with 
several subshifts, their possibly different alphabets will be denoted by A(X), A(Y), 
etc. 

The language £{X) of the subshift X is the set of all finite words or blocks that 
occur as consecutive strings 

(2.5) x[i, i + k — l] = x { x i+ i . . . x i+k -i 

in the infinite sequences x which comprise X . Denote by \w\ the length of a string 
w. Then 

(2.6) jC(X) = {w E A* : there are n £ Z,y £ X such that w — y n . . . y n +\w\-i}- 

Languages of (two-sided) subshifts are characterized by being extractive (or fac- 
torial) (which means that every subword of any word in the language is also in 
the language) and insertive (or extendable) (which means that every word in the 
language extends on both sides to a longer word in the language). 

For each subshift (X, a) of (ild, a) there is a set T(X) of finite "forbidden" words 
such that 

(2.7) X = {x £ Cld '■ for each i < j, XiXi+i . . . Xj ^ ^(X)}. 

A shift of finite type (SFT) is a subshift (X,a) of some (tl(A),cr) for which it is 
possible to choose the set J-{X) of forbidden words defining X to be finite. (The 
choice of set J-(X) is not uniquely determined.) The SFT is n-step if it is possible 
to choose the set of words in J-(X) to have length at most n + 1. We will sometimes 
use "SFT" as an adjective describing a dynamical system. 
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Onc-stcp shifts of finite type may be defined by 0, 1 transition matrices. Let M 
be a d x d matrix with rows and columns indexed by A = {0, 1, . . . , d — 1} and 
entries from {0, 1}. Define 

(2.8) il M = {uj eA z : for all n E Z, M (ui„,ui n +i) = 1}- 

These were called topological Markov chains by Parry [72]. A topological Markov 
chain £Im may be viewed as a vertex shift: its alphabet may be identified with 
the vertex set of a finite directed graph such that there is an edge from vertex i 
to vertex j if and only if M(i,j) = 1. (A square matrix with nonnegativc integer 
entries can similarly be viewed as defining an edge shift, but we will not need edge 
shifts in this paper.) A topological Markov chain with transition matrix M as above 
is called irreducible if for all i, j E A there is k such that M k > 0. Irrcducibility 
corresponds to the associated graph being strongly connected. 

2.2. Sliding block codes. Let (X,a) and (Y, a) be subshifts on alphabets A, A 1 , 
respectively. For k E N, a k-block code is a map 7r : X — > Y for which there are 
m, n > with k = m + n + 1 and a function tt : A k — > A' such that 

(2.9) (irx)i = Tr(xi- m . . . Xi . . . x i+n ). 

We will say that 7r is a block code if it is a fc-block code for some k. 

Theorem 2.1 Curtis-Hedlund-Lyndon Theorem. For subshifts (X,a) and (Y, a), 
a map ip : X — > Y is continuous and commutes with the shift (ipa = crip ) if and 
only if it is a block code. 

If (X, T) and (Y, S) are topological dynamical systems, then a factor map is a 
continuous onto map 7r : X — > Y such that 7rT = Sir. (Y, S) is called a factor 
of (X, T), and (X, T) is called an extension of (Y,S). A one-to-one factor map is 
called an isomorphism or topological conjugacy. 

Given a subshift (X,a), r E Z and fc € Z+, there is a block code 7r = Tr r ^ onto 
the subshift which is the k-block presentation of {X, a), by the rule 

(2.10) (irx)i=x[i + r,i + r + l,...,i + r + k-i\ for all .t e X. 

Here 7r is a topological conjugacy between (A, a) and its image (A[ fe l,cr) which is a 
subshift of the full shift on the alphabet A k . 

Two factor maps <p, ip are topologically equivalent if there exist topological con- 
jugacies a, f3 such that a<f)/3 = ip. In particular, if </> is a block code with (<px)o 
determined by x[—m,n] and k = m + n+ 1 and -0 is the composition (7r rnJ t) _1 
followed by <j>, then -0 is a 1-block code (i.e. (ipx)o = 4>( x o)) which is topologically 
equivalent to (j). 

A sofic shift is a subshift which is the image of a shift of finite type under a factor 
map. A sofic shift Y is irreducible if it is the image of an irreducible shift of finite 
type under a factor map. (Equivalently, Y contains a point with a dense forward 
orbit. Equivalently, Y contains a point with a dense orbit, and the periodic points 
of Y are dense.) 
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2.3. Measures. Given a subshift (X, a), we denote by M.(X) the set of cr-invariant 
Borel probability measures on X. These are the measures for which the coordinate 
projections 7r„(x) = x n for x G X, n G Z, form a two-sided finite-state stationary 
stochastic process. 

Let P be a d x d stochastic matrix and p a stochastic row vector such that 
pP = p. (If P is irreducible, then p is unique.) Define a d x d matrix M with 
entries from {0, 1} by M(i,j) = 1 if and only if P(i,j) > 0. Then P determines a 
1-step stationary (cr-invariant) Markov measure fi on the shift of finite type Om by 



(by the Kolmogorov Extension Theorem) . 

For k > 1, we say that a measure \i G M.(X) is k-step Markov (or more simply 
k- Markov) if for alii > and all j > k — 1 and all i in 1, 



A measure is 1-step Markov if and only if it is determined by a pair (p, P) as 
above. A measure is fc-step Markov if and only if its image under the topological 
conjugacy taking (X, a) to its fc-block presentation is 1-step Markov. We say that 
a measure is Markov if it is fc-step Markov for some k. The set of fc-step Markov 
measures is denoted by M.^ (adding an optional argument to specify the system 
or transformation if necessary.) From here on, "Markov" means "shift-invariant 
Markov with full support", that is, every nonempty cylinder subset of X has positive 
measure. With this convention, a Markov measure with defining matrix P is ergodic 
if and only if P is irreducible. 

A probabilist might ask for motivation for bringing in the machinery of topolog- 
ical and dynamical systems when we want to study a stationary stochastic process. 
First, looking at A4(X) allows us to consider and compare many measures in a 
common setting. By relating them to continuous functions ("thermodynamics" — 
sec Section 3.2 below) we may find some distinguished measures, for example max- 
imal ones in terms of some variational problem. Second, by topological conjugacy 
we might be able to simplify a situation conceptually; for example, many problems 
involving block codes reduce to problems involving just 1-block codes. And third, 
with topological and dynamical ideas we might see (and know to look for) some 
structure or common features, such as invariants of topological conjugacy, behind 
the complications of a particular example. 



2.4. Hidden Markov (sofic) measures. If (X, a) and (Y, a) are subshifts and 
7r : X — » Y is a sliding block code (factor map), then each measure fi G Ai(X) 
determines a measure nfj, G A4(Y) by 

(2.13) (nfi)(E) = pt,('n:~ 1 E) for each measurable E C Y. 

(Some authors write ir*fi or fin^ 1 for 7T/x.) 



(2.11) 



fj,(d(uj[i,j])) = n{y G tt M ■ y[ij] = . ..cjj} 

= p{LJi)P(iOi, LO i+ l) ■ ■ ■ P(uj-j^l,Uj) 



(2.12) 



n(C (x[0,i})\Co(x[-j,-l})) = ti{Co{x[Q,i])\Co(x[-k,-l})). 
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If X is SFT, /x is a Markov measure on X and ir : X — > Y is a sliding block 
code, then tt/j on Y is called a hidden Markov measure or so/ic measure. (Various 
other names, such as "submarkov" and "function of a Markov chain" have also been 
used for such a measure or the associated stochastic process.) Thus tv/j, is a convex 
combination of images of ergodic Markov measures. From here on, unless otherwise 
indicated, the domain of a Markov measure is assumed to be an irreducible SFT, 
and the Markov measure is assumed to have full support ( and thus by irreducibility 
be ergodic). Likewise, unless otherwise indicated, a sofic measure is assumed to 
have full support and to be the image of an ergodic Markov measure. Then the 
sofic measure is ergodic and it is defined on an irreducible sofic subshift. Hidden 
Markov measures provide a natural way to model systems governed by chance in 
which dependence on the past of probabilities of future events is limited (or at 
least decays, so that approximation by Markov measures may be reasonable) and 
complete knowledge of the state of the system may not be possible. 

Hidden Markov processes are often defined as probabilistic functions of Markov 
chains (see for example [33]), but by enlarging the state space each such process can 
be represented as a deterministic function of a Markov chain, such as we consider 
here (see [8]). 

The definition of hidden Markov measure raises several questions. 

Problem 2.2. Let fi be a 1-step Markov measure on (X, a) and ir : X — > Y a 
1-block code. The image measure may not be Markov — see Example 2.8. What are 
necessary and sufficient conditions for to be 1-step Markov? 

This problem has been solved, in fact several times. Similarly, given \x and 7r, 
it is possible to determine whether 7r/x is fc-step Markov. Further, given 7r and a 
Markov measure pi, it is possible to specify k such that either irfj, is fc-step Markov 
or else is not Markov of any order. These results are discussed in Section 5. 

Problem 2.3. Given a shift-invariant measure v on (Y, a), how can one tell whether 
or not v is a hidden Markov measure? If it is, how can one construct Markov 
measures of which it is the image? 

The answers to Problem 2.3 provided by various authors are discussed in Section 
4. The next problem reverses the viewpoint. 

Problem 2.4. Given a sliding block code 7r : X — > Y and a Markov measure v on 
(Y, a), does there exist a Markov measure fi on X such that nfi = vl 

In Section 3, we take up Problem 2.4 (which apart from special cases remains 
open) and some theoretical background that motivates it. 

Recall that a factor map tt : X — > Y between irreducible sofic shifts has a degree, 
which is the cardinality of the preimage of any doubly transitive point of Y [66]. 
(If the cardinality is infinite, it can only be the power of the continuum, and we 
simply write degree(Tr) = oo.) If n has degree n < oo, then an ergodic measure v 
with full support on Y can lift to at most n ergodic measures on X. We say that 
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the degree of a hidden Markov measure v, also called its sofic degree, is the minimal 
degree of a factor map which sends some Markov measure to v. 

Problem 2.5. Given a hidden Markov measure v on (Y, a), how can one determine 
the degree of vl If the degree is n < oo, how can one construct Markov measures 
of which v is the image under a degree n map? 



We conclude this section with examples. 

Example 2.6. An example was given in [69] of a code 7r : X — > Y that is non- 
Markovian: some Markov measure on Y does not lift to any Markov measure on 
X, and hence (see Section 3.1) no Markov measure on Y has a Markov preimage on 
X. The following diagram presents a simpler example, due to Sujin Shin [91, 93], 
of such a map. Here it is a 1-block code: 7r(l) = 1 and = 2 if j ^ 1. 




Example 2.7. Consider the shifts of finite type given by the graphs below, the 
1-block code n given by the rule w(a) = a,n(bi) = 7r(6 2 ) = b, and the Markov 
measures fi, v defined by the transition probabilities shown on the edges. We have 
TTfi = v, so the code is Markovian — some Markov measure maps to a Markov 
measure. 




Example 2.8. This example uses the same shifts of finite type and 1-block code as 
in Example 2.7, but we define a new 1-step Markov measure on the upstairs shift 
of finite type X by assigning transition probabilities as shown. 
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The entropy of the Markov measure fi (the definition is recalled in Sec. 3.2) is 
readily obtained from the familiar formula — ^PiPij logP^, but there is no such 
simple rule for computing the entropy of v. If v were the finite-to-one image of some 
other Markov measure /i', maybe on some other shift of finite type, then we would 
have h(y) = h(fi') and the entropy of v would be easily computed by applying the 
familiar formula to //. But for this example (due to Blackwcll [13]) it can be shown 
[69] that v is not the finite-to-one image of any Markov measure. Thus Problem 
2.5 is relevant to the much-studied problem of estimating the entropy of a hidden 
Markov measure (see [44, 45] and their references). 

Exam-pit 2.9. In this example presented in [97], X = Y = £2 = full 2-shift, and 
the factor map is the 2-block code 

(2.14) (7rx)o = xo + x\ mod 2. 

Suppose < p < 1 and \x v is the Bernoulli (product) measure on X, with /i(Co(l)) = 
p. Let v p denote the hidden Markov measure 7r/i p = 7r/ii_ p . If p 7^ 1/2, then v v is 
a hidden Markov measure strictly of degree 2 (it is not degree 1). 

3. Factor maps and thermodynamical concepts 

3.1. Markovian and non-Markovian maps. We have mentioned (Example 2.8) 
that the image under a factor map 7r : X — > Y of a Markov measure need not be 
Markov, and (Example 2.6) that a Markov measure on Y need not have any Markov 
preimages. In this section we study maps that do not have the latter undesirable 
property. Recall our convention: a Markov measure is required to have full support. 

Definition 3.1. [18] A factor map it : £Ia —> between irreducible shifts of 
finite type (A and B are 0, 1 transition matrices, see (2.8)) is Markovian if for every 
Markov measure v on Qb, there is a Markov measure on Qa such that nfj, = v. 

Theorem 3.2. [18] For a factor map n : D,a ~> &B between irreducible shifts of 
finite type, if there exist any fully supported Markov [i and v with 7T/x = v , then it 
is Markovian. 

Note that if a factor map is Markovian, then so too is every factor map which 
is topologically equivalent to it, because a topological conjugacy takes Markov 
measures to Markov measures. We will see a large supply of Markovian maps (the 
"e-resolving factor maps") in Section 6.1. 

These considerations lead to a reformulation of Problem 2.4: 

Problem 3.3. Give a procedure to decide, given a factor map 7r : Ha 
whether tt is Markovian. 

We sketch the proof of Theorem 3.2 for the 1-step Markov case: if any 1-stcp 
Markov measure on £Ib lifts to a 1-stcp Markov measure, then every 1-step Markov 
measure on £Ib lifts to a 1-stcp Markov measure. For this, recall that if M is an 
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irreducible matrix with spectral radius p, with positive right eigenvector r, then 
the stochasticization of M is the stochastic matrix 

(3.1) stoch(M) = -D~ X MD , 

P 

where D is the diagonal matrix with diagonal entries D(i,i) = r(i). 

Now suppose that tt : Qa is a 1-block factor map, with n(i) denoted i 

for all i in the alphabet of Qa', that [i, v are 1-step Markov measures defined by 
stochastic matrices P, Q; and that nfJ, = v. Suppose that v' e M.(Q,b) is defined by 
a stochastic matrix Q' . We will find a stochastic matrix P' defining \i' in A^f^) 
such that 7r/i' = v' . 

First define a matrix M of size matching P by M(i,j) = if P(i,j) = and 
otherwise 

(3.2) M(i,j) = Q'(iJ)P(i,j)/Q(iJ), 

This matrix M will have spectral radius 1. Now set P' = stoch(M). The proof that 
wfJ,' = v' is a straightforward computation that 717/ = v' on cylinders Co(y[0,n]) 
for all n £ N and y £ 17s. This construction is the germ of a more general 
thermodynamic result, the background for which we develop in the next section. 
We finish this section with an example. 

Example 3.4. In this example one sees explicitly how being able to lift one Markov 
measure to a Markov measure, allows one to lift other Markov measures to Markov 
measures. 

Consider the 1-block code tt from fl 3 = {0, 1, 2} z to fl 2 = {0, 1} Z , via h-> and 
1, 2 t— y 1. Let v be the 1-step Markov measure on f2 2 given by the transition matrix 

(1/2 l/2\ 
[l/2 I/2J ■ 

Given positive numbers a,/3,7 < 1, the stochastic matrix 

(1/2 a(l/2) (l-a)(l/2)\ 

(3.3) 1/2 (5(1/2) (l-/3)(l/2) 

\l/2 7(1/2) (l- 7 )(l/2)/ 

defines a 1-step Markov measure on J7 3 which 7r sends to v. 

Now, if v 1 is any other 1-step Markov measure on X2, given by a stochastic 
matrix 



then v' will lift to the 1-step Markov measure defined by the stochastic matrix 

(P aq (l-a)q\ 
(3.4) r /3s (l-jS)s) . 

\ r 7s (1 — j)s J 
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3.2. Thermodynamics on subshifts 001. We recall the definitions of entropy 
and pressure and how the thermodynamical approach provides convenient machin- 
ery for dealing with Markov measures (and hence eventually, it is hoped, with 
hidden Markov measures). 

Let {X,a) be a subshift and fi € M(X) a shift-invariant Borel probability mea- 
sure on X. The topological entropy of (X,o~) is 

(3.5) h(X) = lim -log|{x[0,n- 1] : x £ X}\. 

n— too Ji 

The measure-theoretic entropy of the measure-preserving system (X, <r, ji) is 
(3.6) 

h(n) = hp(X)= lim - y"{-fi(C (w))\ogfi(Co(w)):we{x[0,n-l}:xeX}}. 
(For more background on these concepts, one could consult [78, 96].) 

Pressure is a refinement of entropy which takes into account not only the map 
(7 '. X — y X but also weights coming from a given "potential function" / on X. 
Given a continuous real- valued function / S C(X,M), we define the pressure of f 
( with respect to a) to be 

(3.7) P(f,a) = lim - log V{exp[5„(/, w)} : w e {*[0,n-l] : a; € X}}, 

n— >oc 72, * — * 

where 

n-l 

(3.8) S n (f,w) = for some .t e X such that x[0,n — 1] = u>. 

i=0 

(In the limit the choice of x doesn't matter.) Thus, 

(3.9) if / = 0, then P( f, a) = h(X). 

The pressure functional satisfies the important Variational Principle: 

(3.10) P(J,o) = sup{/i(/i) + J f dp: fie M(X)}. 

An equilibrium state for / (with respect to cr) is a measure fj, = fit such that 

(3.11) P{f,a) = h(fi)+ [ f dfi. 



Often (e.g., when the potential function / is Holder continuous on an irreducible 
shift of finite type), there is a unique equilibrium state /u/ which is a (Bowen) Gibbs 
measure for /: i.e., P(f,a) = log(p), and 

(3.12) fi f (C (x[0,n- 1])) ~ p- n expS n f(x). 

Here "~" means the ratio of the two sides is bounded above and away from zero, 
uniformly in x and n. 

If / € C(SIa, K), depends on only two coordinates, f(x) = /(xqXi) for all x £ Qa, 
then / has a unique equilibrium state fif, and fif £ A4(f2>i). This measure /U/ is 
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the 1-step Markov measure defined by the stochastic matrix P = stoch(Q), where 



(For an exposition see [73].) 

The pressure of / is logp, where p is the spectral radius of Q. Conversely, a 
Markov measure with stochastic transition matrix P is the equilibrium state of the 
potential function f[ij] = log P(i,j). 

By passage to the fc-block presentation, we can generalize to the case of fc-step 
Markov measures: if f(x) = f(xQX\ ■ • ■ Xk), then / has a unique equilibrium state 
/x, and \x is a fc-step Markov measure. 

Definition 3.5. We say that a function on a subshift X is locally constant if there 
is m G N such that f(x) depends only on x[—m,m]. LC(X,M.) is the vector space 
of locally constant real- valued functions on X. Ck(X, R) is the set of / in LC(X, R) 
such that f{x) is determined by x[0, k — 1]. 

Wc can now express a viewpoint on Markov measures, due to Parry and Tuncel 
[95, 74], which follows from the previous results. 

Theorem 3.6. [74] Suppose CIa is an irreducible shift of finite type; k > 1; and 
f,g G Ck(X, R). Then the following are equivalent. 

(!) H = /V 

(2) There are h G C(X, R) and c G R such that f = g + (h - h o a) + c. 

(3) There are h G Ck-i(X, R) and c G R such that f = g + (h — h o o~) + c. 

Proposition 3.7. [74] Suppose D,a is an irreducible shift of finite type. Let 



and these maps are bijections. 

3.3. Compensation functions. Let tt : (X, T) — > (Y, S) be a factor map between 
topological dynamical systems. A compensation function for the factor map is a 
continuous function £ : X — > R such that 



(3.13) 




ifA(i,j) = 0, 

exp[/(ij)] otherwise . 




(3.15) 



Py(V) = Px{Vott + Z) for all V G C(Y, R). 



Because h(irfi) < h(fi) and J V d(irfi) — J V o ndjjL, we always have 



(3.16) P Y (V)=sup{h(u)+ [ Vdu:veM{Y)} 



JY 




JX 
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with possible strict inequality when tt is infinite-to-one, in which case a strict in- 
equality h(n) > h(Tr/j,) can arise from (informally) the extra information/complexity 
arising from motion in fibers over points of Y. The pressure equality (3.15) tells 
us that the addition of a compensation function £ to the functions V o tt takes into 
account (and exactly cancels out), for all potential functions V on Y at once, this 
measure of extra complexity. Compensation functions were introduced in [18] and 
studied systematically in [97] . A compensation function is a kind of oracle for how 
entropy can appear in a fiber. The Markovian case is the case in which the oracle 
has finite range, that is, there is a locally constant compensation function. 

A compensation function for a factor map tt : X — > Y is saturated if it has the 
form G o tt for a continuous function G on Y. 

Example 3.8. For the factor map in Examples 2.7 and 2.8, the formula 
(3.18) G{y) 



— log 2 if y = .a ■ 
if y = .b. 



determines a saturated compensation function G o tt on fi^. The sum (or cocycle) 
S n G(y) = G(y) + G(ay) + • • • + G(a n ~ 1 y) measures the growth of the number of 
preimages of initial blocks of y: 

(3.19) \ir-\yo . . . !/„_!)! = 2#{' i ^=°'°^ < "} ±1 ~ 2 #^= a ' ^ <ra > = e - s " G ^\ 

Example 3.9. In the situation described at the end of Section 3.1, in which a 1- 
step Markov measure maps to a 1-step Markov measure under a 1-block map, an 
associated compensation function is 

(3.20) £(x) = log P(i, j) — \ogQ(i,j) when xqXi = ij. 

Theorem 3.10. [18, 97] Suppose that n : Qa is a factor map between 

irreducible shifts of finite type, with f £ LC(Qa) cind g G LC{£Ib), and ir^f = fx g . 
Then there is a constant c such that f — g o tt + c is a compensation function. 
Conversely, if £ is a locally constant compensation function, then /U^+gow is Markov 
and 7T^ +go7T = fi g . 



In Theorem 3.10, the locally constant compensation function £ relates potential 
functions on 57^ to their lifts by composition on in the same way that the 
corresponding equilibrium states are related: 

LC(tt B ) LC{Q A ) via g -> [g o tt) + £ 
M{tt B )^M(n A ) via fi g -> ti {g0 n) +i - 



Theorem 3.10 holds if we replace the class of locally constant functions with the 
class of Holder (exponentially decaying) functions, or with functions in the larger 
and more complicated "Walters class" (defined in [97, Section 4]). More generally, 
the arguments in [97, Theorem 4.1] go through to prove the following. 

Theorem 3.11. Suppose that tt : £Ia — > f^s is a factor map between irreducible 
shifts of finite type. Let Va, Vb be real vector spaces of functions in C(0^,K), C(f2s, K) 
respectively such that the following hold. 
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(1) Va and Vb contain the locally constant functions. 

(2) If f is in Va or Vb, then f has a unique equilibrium state fj,/, and [if is a 
Gibbs measure. 

(3) IffeV B , then f one V A . 

Suppose f £Va and g S Vb, and TTfMf = fx g . Then there is a constant C such that 
f — goTt + C is a compensation function. Conversely, if £ vtiVa is a compensation 
function, then for all g G Vb it holds that 7r/^ +907r = fi g . 

Moreover, if G G Vb , then G o tt is a compensation function if and only if there 
is c > 1 such that 

(3.22) - < e s " G ^ (tt- 1 (y . . . y n ^)\ < c for all y, n. 

c 

Problem 3.12. Determine whether there exists a factor map tt : X —> Y between 
mixing SFT's and a potential function F G C(X) which is not a compensation 
function but has a unique equilibrium state whose image tt^lf is the measure 
of maximal entropy on 7. If there were such an example, it would show that the 
assumptions on function classes in Theorem 3.11 cannot simply be dropped. 

We finish this section with some more general statements about compensation 
functions for factor maps between shifts of finite type. 

Proposition 3.13. [97] Suppose that tt : Qa — > Qb is a factor map between irre- 
ducible shifts of finite type. Then 

(1) There exists a compensation function. 

(2) If £ is a compensation function, g G C(f2s,IR), and fi is an equilibrium 
state of ^ + g o tt, then TTfi is an equilibrium state of g. 

(3) The map tt takes the measure of maximal entropy (see Section 3.5) of VLa 
to that offlB if and only if there is a constant compensation function. 



Yuki Yayama [99] has begun the study of compensation functions which are 
bounded Borcl functions. 



3.4. Relative pressure. When studying factor maps, relativized versions of en- 
tropy and pressure are relevant concepts. Given a factor map tt : D,a — > &b between 
shifts of finite type, for each n = 1, 2, • • • and y G Y, let D n (y) be a set consisting of 



exactly one point from each nonempty set [xq 



-iJriTr- 1 ^). Let V G C(fl A , 



be a potential function on Qa- For each y G flB, the relative pressure of V at y 
with respect to tt is defined to be 



(3.23) 



1 



lim sup — log 

n— ^oo ri 



n-1 

i=0 



J2 exp 

■xeD n (y) 

The relative topological entropy function is defined for all y G Y by 

1 



(3.24) 



P(ir,0)(y) = lim sup — log 

n— >oo ^ 



Dn(y) , 
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the relative pressure of the potential function V = 0. 

For the relative pressure function, a Relative Variational Principle was proved 
by Lcdrappier and Walters ([64], see also [30]): for all v in M(£Ib) and all V in 

c(n A ), 

(3.25) J P(tt, V) dv = sup + / Vd^ : tt^ = i/| - /i(z/). 

In particular, for a fixed v £ .M(^_b), the maximum measure-theoretic entropy 
of a measure on £Ia that maps under 7r to is given by 

(3.26) h(u) + sup{/i Al (X|y) : 7r/x = ^} = + sup{/i(/z) — h(v) : irfi = v} 

= h{v) + J P{-K,0)dv . 

In [80] a finite-range, combinatorial approach was developed for the relative 
pressure and entropy, in which instead of examining entire infinite sequences x in 
each fiber over a given point y £ fig, it is enough to deal just with preimages of 
finite blocks (which may or may not be extendable to full sequences in the fiber). 
For each n = 1,2, . . . and y £ Y let E n (y) be a set consisting of exactly one point 
from each nonempty cylinder x[0, n—l] C 7r" 1 y[0, n— 1]. Then for each V £ C(Q,a), 

r n— 1 



(3.27) P(ir,V)(y) = limsup -log 



a.e. wii/i respect to every ergodic invariant measure on Y. Thus, we obtain the 
value of P(tt, V)(y) a.e. with respect to every ergodic invariant measure on Y if we 
delete from the definition of D n (y) the requirement that x £ 7r _1 (y). 

In particular, the relative topological entropy is given by 

(3.28) P(7r,0)(y) = limsup- log \ir~ l y[0,n- 1]| 

n—too 

a.e. with respect to every ergodic invariant measure on Y. 

And if \i is relatively maximal over v, in the sense that it achieves the suprcmum 
in (3.26), then 

(3.29) K(X\Y)= [ lim -log ^-^[O, n-1] \dv{y). 



3.5. Measures of maximal and relatively maximal entropy. Already Shan- 
non [90] constructed the measures of maximal entropy on irreducible shifts of finite 
type. Parry [72] independently and from the dynamical viewpoint rediscovered 
the construction and proved uniqueness. For an irreducible shift of finite type the 
unique measure of maximal entropy is a 1-stcp Markov measure whose transition 
probability matrix is the stochasticization, as in (3.1), of the 0, 1 matrix that defines 
the subshift. When studying factor maps ir : Ua — » &b it is natural to look for 
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measures of maximal relative entropy, which we also call relatively maximal mea- 
sures : for fixed v on f^s, look for the pt G ir~ l v which have maximal entropy in that 
fiber. Such measures always exist by compactness and upper semicontinuity, but, 
in contrast to the Shannon-Parry case (when VLb consists of a single point), they 
need not be unique. E.g., in Example 2.9, the two-to-one map tt respects entropy, 
and for p ^ 1/2 there are exactly two ergodic measures (the Bernoulli measures 
Hp and Hi—p) which tt sends to v v . Moreover, there exists some V p G C(Y) which 
has v p as a unique equilibrium state [52, 81], and V p o tt has exactly two ergodic 
equilibrium states, n P and 

Here is a useful characterization of relatively maximal measures due to Shin. 

Theorem 3.14 [92]. Suppose that tt : X —> Y is a factor map of shifts of finite 
type, v G M{Y) is ergodic, and wfi = v. Then (i is relatively maximal over v if and 
only if there is V € C(Y, R) such that fj, is an equilibrium state of V o tt. 

If there is a locally constant saturated compensation function G o tt, then every 
Markov measure on Y has a unique relatively maximal lift, which is Markov, because 
then the relatively maximal measures over an equilibrium state of V G C(Y,K) arc 
the equilibrium states oiVoir + GoTt [97]. Further, the measure of maximal 
entropy m&xx is the unique equilibrium state of the potential function on X; and 
the relatively maximal measures over maxy are the equilibrium states of G o tt. 

It was proved in [79] that for each ergodic v on Y, there are only a finite number 
of relatively maximal measures over v. In fact, for a 1-block factor map tt between 1- 
step shifts of finite type X, Y, the number of ergodic invariant measures of maximal 
entropy in the fiber tt~ 1 {v} is at most 

(3.30) N v (ir) = min{|7r- 1 {6}| : b G A(Y), v[b] > 0}. 

This follows from the theorem in [79] that for each ergodic v on Y, any two 
distinct ergodic measures on X of maximal entropy in the fiber 7r -1 {V} are relatively 
orthogonal. This concept is defined as follows. 

For fj,i, . . . , fj, n G A4(X) with nfM — v for all i, their relatively independent 
joining p, over v is defined by: 

if Ai, . . . , A n are measurable subsets of X and J- is the er-algcbra of Y, then 

„ n 

(3.31) (i(A 1 x...xA n )= / IjEjJl^lTT^oTr- 1 ^ 

J Y »=i 

in which E denotes conditional expectation. Two ergodic measures with 
7T/Xi = 7T/J2 = v ar c relatively orthogonal (over v), fii [ii, if 

(3.32) (//! ® v /i 2 ){(u, v) G X x X : u = v } = 0. 

This means that with respect to the relatively independent joining or coupling, 
there is zero probability of coincidence of symbols in the two coordinates. 
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That the second theorem (distinct ergodic relatively maximal measures in the 
same fiber are relatively orthogonal) implies the first (no more than Nv{tt) relatively 
maximal measures over v) follows from the Pigeonhole Principle. If we have n > 
N v (tt) ergodic measures /ii, . . . , /i„ on X, each projecting to v and each of maximal 
entropy in the fiber 7r -1 -{V}, we form the relatively independent joining fi on X n 
of the measures ^ as above. Write pi for the projection X n — > X onto the i'th 
coordinate. For /2-almost every x in X n , ir(pi(x)) is independent of i\ abusing 
notation for simplicity, denote it by 7r(.x). Let 6 be a symbol in the alphabet of 
Y such that b has N v (tt) preimages a\, . . . ,aisi v u) under the block map ir. Since 
n > -/V„(7r), for every x € 7i" _1 [fr] there arc j with (pi£)o = (pj£)o. At least one 
of the sets SVj = {i € X n : (pi£)o = (pji:)o} must have positive /t-measure, and 
then also 

(3.33) (ni ® v [J,j){(u, v) G X x X : nu = irv, uq = vq} > 0, 

contradicting relative orthogonality. (Briefly, if you have more measures than 
preimage symbols, two of those measures have to coincide on one of the symbols: 
with respect to each measure, that symbol a.s. appears infinitely many times in 
the same place.) 

The second theorem is proved by "interleaving" measures to increase entropy. If 
there are two relatively maximal measures over v which are not relatively orthogo- 
nal, then the measures can be 'mixed' to give a measure with greater entropy. We 
concatenate words from the two processes, using the fact that the two measures are 
supported on sequences that agree infinitely often. Since X is a 1-step SFT, we can 
switch over whenever a coincidence occurs. That the switching increases entropy 
is seen by using the strict concavity of the function — t logt and lots of calculations 
with conditional expectations. 

Example 3.15. Here is an example (also discussed in [79, Example 1]) showing that 
to find relatively maximal measures over a Markov measure it is not enough to 
consider only sofic measures which map to it. We describe a factor map ir which 
is both left and right e- resolving (see section 6.1) and such that there is a unique 
relatively maximal measure fi above any fully-supported Markov measure v, but 
the measure fi is not Markov, and it is not even sofic. 

We use vertex shifts of finite type. The alphabet for the domain sub-shift is 
{ai,a,2,b} (in that order for indexing purposes), and the factor map (onto the 2- 
shift (O2 , a)) is the 1-block code 7r which erases subscripts. The transition diagram 
and matrix A for the domain shift of finite type (J7a,c) are 



(3.34) 
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Above the word ba n b in Q2 there are n + 1 words in CIa- above a™ we see k ax's 
followed by 71 — fc a2 , s, where < k < n. Let us for simplicity consider the maximal 
measure v on {Q,2,T); so, is(Co(ba n b)) = 2~ n ~ 2 . Now the maximal entropy lift \i 
of v will assign equal measure 2~ ijl+2 ^ / (n + 1) to each of the preimage blocks of 
ba n b. If /_i is sofic, then (as in Sec. 4.1.4) there are vectors u, v and a square matrix 
Q such that fi(Co(b(ai) n b) = uQ n v for all n > 0. Then the function n H> uQ n v is 
some finite sum of terms of the form rn J (A") where j € Z + and r, A are constants. 
The function n H> 2~( n+2 ) j{n + 1) is not a function of this type. 

Problem 3.16. Is it true that for every factor map ir : £Ia —> Qb every (fully 
supported) Markov measure v on £Ib has a unique relatively maximal measure 
that maps to it, and this is also a measure with full support? 

Remark 3.17. After the original version of this paper was posted on the Math Arxiv 
and submitted for review, we received the preprint [100] of Jisang Yoo containing 
the following result: "Given a factor map from an irreducible SFT X to a sofic 
shift Y and an invariant measure vonY with full support, every measure on X 
of maximal relative entropy over v is fully supported." This solves half of Problem 
3.16. 

3.6. Finite-to-one codes. Suppose 7r : Ha — > is a finite-to-one factor map of 
irreducible shifts of finite type. There are some special features of this case which 
we collect here for mention. Without loss of generality, after recoding we assume 
that 7r is a 1-block code. Given a Markov measure \x and a periodic point x we 
define the weight-per-symbol of x (with respect to fx) to be 

(3.35) wps (x) := lim — log/_t{y : x ?; = y,:,0 < i < n} . 

" n— >oc TL 

Proposition 3.18. Suppose tt : £Ia — > is a finite-to-one factor map of irre- 
ducible shifts of finite type. Then 

(1) The measure of maximal entropy on fls lifts to the measure of maximal 
entropy on Q.a- 

(2) Every Markov measure on £Ib lifts to a unique Markov measure of equal 
order on Qa- 

(3) If [i^v are Markov measures on £Ia,Qb respectively, then the following are 
equivalent: 

(a) 7T/1 = v 

(b) for every periodic point x in VLa, wps^rr) = wps„(7ra:). 

Proofs can be found in, for example, [56]. For infinite-to-one codes, we do not 
know an analogue of Prop. 3.18 (3). 

3.7. The semigroup measures of Kitchens and Tuncel. There is a hierarchy 
of sofic measures according to their sofic degree. Among the dcgrce-1 sofic measures, 
there is a distinguished and very well behaved subclass, properly containing the 
Markov measures. These are the semigroup measures introduced and studied by 
Kitchens and Tuncel in their memoir [57]. Roughly speaking, semigroup measures 
are to Markov measures as sofic subshifts are to SFT's. 
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A sofic subshift can be presented by a semigroup [98, 57]. Associated to this are 
nonnegative transition matrices Rq,Lq. A semigroup measure (for the semigroup 
presentation) is defined by a state probability vector and a pair of stochastic ma- 
trices R,L with 0/+ pattern matching Rq,Lq and satisfying certain consistency 
conditions. These matrices can be multiplied to compute measures of cylinders. A 
measure is a semigroup measure if there exist a semigroup and apparatus as above 
which can present it. We will not review this constructive part of the theory, but 
we mention some alternate characterizations of these measures. 

For a sofic measure /ionI and a periodic point x m X, the weight-per-symbol 
of x with respect to [i is still well defined by (3.35). Let us say a factor map 
7r respects [i-weights if whenever x, y are periodic points with the same image we 
have wps M (x) = wps„(y). Given a word U = U[— n ... 0] and a measure fj,, let [iy 
denote the conditional measure on the future, i.e. if UW is an allowed word then 
f x u (W)=^(UW)/ f x(U). 

Theorem 3.19. [57] Let v be a shift- invariant measure on an irreducible sofic 
subshift Y . Then the following are equivalent: 

(1) v is a semigroup measure. 

(2) v is the image of a Markov measure \i under a finite-to-one factor map 
which respects [i-weights. 

(3) v is the image of a Markov measure (i under a degree 1 resolving factor 
map which respects ^.-weights. 

(4) The collection of conditional measures [L\j , as U ranges over all Y -words, 
is finite. 

There is also a thermodynamic characterization of these measures as unique 
equilibrium states of bounded Borel functions which are locally constant on doubly 
transitive points, very analogous to the characterization of Markov measures as 
unique equilibrium states of continuous locally constant functions. The semigroup 
measures satisfy other nice properties as well. 

Theorem 3.20. [57] Suppose n : X — » Y is a finite-to-one factor map of irreducible 
sofic subshifts and (i and v are semigroup measures on X and Y respectively. Then 

(1) v lifts by tt to a unique semigroup measure on X , and this is the unique 
ergodic measure on X which maps to v; 

(2) tt\i is a semigroup measure if and only if tt respects ^-weights; 

(3) there is an irreducible sofic subshift X' of X such that tt maps X' finite-to- 
one onto X [69], and therefore v lifts to a semigroup measure on X' . 

In contrast to the last statement, it can happen for an infinite-to-one factor 
map between irreducible SFTs that there is a Markov measure on the range which 
cannot lift to a Markov measure on any subshift of the domain [69]. 

We finish here with an example. There are others in [57]. 



20 



MIKE BOYLE AND KARL PETERSEN 



Example 3.21. This is an example of a finite-to-one, one-to-one a.e. 1-block code 
7r : £Ia — > &>b between mixing vertex shifts of finite type, with a 1-step Markov 
measure \x on f2^, such that the following hold: 

(1) For all periodic points x, y in £Ia, ttx = iry implies that wps M (x) = wps AI (y) . 

(2) 7T/U is not Markov on 

Here the alphabet of fi^ is {1, 2, 3}; the alphabet of fig is {1, 2}; 

/0 1 0\ 
A = 1 1 and B 

\i i 0/ 



1 

1 1 



and 7r is the 1-block code sending 1 to 1 and sending 2 and 3 to 2. The map 7r 
collapses the points in the orbit of (23)* to a fixed point and collapses no other 
periodic points. (Given a block B, we let B* denote a periodic point obtained by 
infinite concatenation of the block B.) 

Let / be the function on Ua such that f(x) = log 2 if xqXi = 23, f{x) = log(l/2) 
if XqXi = 32 and f(x) = otherwise. Let /x be the 1-step Markov measure which is 
the unique equilibrium state for /, defined by the stochasticization P of the matrix 

/0 1 0^ 
M = 1 2 

\1 1/2 0y 

Let A denote the spectral radius of M. Suppose that v = nfi is Markov, of any 
order. Then wps J/ (2*) = wps M ((23)*) = — log A. Also, there must be a constant c 
such that for all large n, 

(3.36) wp Siy ((12 n )*) = — J-(c + (n + l)wps„(2*)) = < -—r - log A . 

n + 1 n + 1 

So, for all large n, 
(3.37) 

log A = wps,((12 2 ")*) = wps,((l(23)")*) = ^-j log(2A-( 2 " +1 )) 



and 



1 



— — - log A = wps,((12 2 " +1 )*) = wp S/1 ((l(23)™2)*) = — — log(A-( 2 " +2 ) 
Thus c = log 2 and c = 0, a contradiction. Therefore Ttfj, is not Markov. 



4. Identification of hidden Markov measures 



Given a finite-state stationary process, how can we tell whether it is a hidden 
Markov process? If it is, how can we construct some Markov process of which it is 
a factor by means of a sliding block code? When is the image of a Markov mea- 
sure under a factor map again a Markov measure? These questions are of practical 
importance, since scientific measurements often capture only partial information 
about systems under study, and in order to construct useful models the significant 
hidden variables must be identified and included. Beginning in the 1960's some 
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criteria were developed for recognizing a hidden Markov process: loosely speaking, 
an abstract algebraic object constructed from knowing the measures of cylinder sets 
should be in some sense finitely generated. Theorem 4.20 below gives equivalent 
conditions, in terms of formal languages and series (the series is "rational"), linear 
algebra (the measure is "linearly representable" ) , and abstract algebra (some mod- 
ule is finitely generated), that a shift- invariant probability measure be the image 
under a 1-block map of a shift-invariant 1-step Markov measure. In the following 
we briefly explain this result, including the terminology involved. 

Kleene [59] characterized rational languages as the linearly representable ones, 
and this was generalized to formal series by Schiitzenberger [89]. In the study 
of stochastic processes, functions of Markov chains were analyzed by Gilbert [40], 
Furstenberg [39], Dharmadhikari [23, 24, 25, 26, 27, 28], Heller [48, 49], and oth- 
ers. For the connection between rational series and continuous images of Markov 
chains, we follow Berstel-Reutenauer [9] and Hanscl-Pcrrin [46], with an addition 
to explain how to handle zero entries. Subsequent sections describe the approaches 
of Furstenberg and Heller and related work. 

Various problems around these ideas were (and continue to be) explored and 
solved. In particular, it is natural to ask when is the image of a Markov measure /i 
under a continuous factor map n a Gibbs measure (see (3.12), or when is the image 
of a Gibbs measure again a Gibbs measure? Chazottes and Ugalde [21] showed 
that if \x is fc-step Markov on a full shift fi^ and 7r maps fid onto another full shift 
f2 d , then the image tt^j, is a Gibbs measure which is the unique equilibrium state of 
a Holder continuous potential which can be explicitly described in terms of a limit 
of matrix products and computed at periodic points. They also gave sufficient 
conditions in the more general case when the factor map is between SFT's. The 
case when fi is Gibbs but not necessarily Markov is considered in [22]. For higher- 
dimensional versions see for example [63, 68, 43]. 

Among the extensive literature that we do not cite elsewhere, we can mention 
in addition [47, 70, 35, 10, 88]. 

4.1. Formal series and formal languages. 

4.1.1. Basic definitions. As in Section 2.1, continue to let A be a finite alphabet, 
A* the set of all finite words on A, and A + the set of all finite nonempty words on 
A. Let e denote the empty word. A language on A is any subset C C A* . 

Recall that a monoid is a set S with a binary operation S x S — > 5* which is 
associative and has a neutral element (identity) . This means we can think of .4* as 
the multiplicative free monoid generated by A, where the operation is concatenation 
and the neutral element is e. 

A formal series (nonnegative real- valued, based on A) is a function s : A* —> M+. 
For all w E A* , s(w) = (s, w) e R + , which can be thought of as the coefficient of w 
in the series s. We will think of this s as YlweA* s ( w ) w i an d this will be justified 
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later. If v € A* and s is the series such that s(v) = 1 and s(w) = otherwise, then 
we sometimes use simply v to denote s. 

Associated with any language C on A is its characteristic series Fc : A* — > R+ 
which assigns 1 to each word in C and to each word in A* \ C. Associated to any 
Borel measure /j, on A z+ is its corresponding series defined by 

(4.1) F^w) = m(CoM) = V{x £ A z + : x[0, \w\ - 1] = w}. 



It is sometimes useful to consider formal series with values in any semiring K, 
which is just a ring without subtraction. That is, K is a set with operations + 
and • such that (K, +) is a commutative monoid with identity element 0, (K, •) is 
a monoid with identity element 1; the product distributes over the sum; and for 
k e K, Ok = kO = 0. 

We denote the set of all if- valued formal series based on A by K((A)) or Fk{A). 
We further abbreviate R+ ((A)) = J 7 {A). 

Then F(A) is a semiring in a natural way: For /i , / 2 G ^(.A) , define 



(1) (h + f2)(w) = f 1 (w)+f 2 ( W ) 

(2) (fif2)(w) = fi( u )f2(v), where the sum is over all u,v £ A* such that 
uv = w, a finite sum. 



The neutral element for multiplication in ^(-4) is 
(4.2) Sl (w) 



1 if w = e 
otherwise. 



As discussed above, we will usually write simply e for si . There is a natural injection 
K+ ^ T(A) defined by 1 1-> te for all t e M + . 

Note that: 



M+ acts on F(A) on both sides: 

(ts)(w) = ts(w), (st)(w) = s(w)t, for all w £ A*, for all (ei + . 

There is a natural injection A* F(A) as a multiplicative submonoid: 

For w £ A* and v £ A*, define 



w(v) = S. u 



if w = v 
otherwise. 



This is a 1-term series. 
Definition 4.1. The support of a formal series s £ J 7 (A) is 

supp(s) = {w £ A* : s(iy) ^ 0}. 



Note that supp(s) is a language. A language corresponds to a series with coefficients 
and 1, namely its characteristic series. 
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Definition 4.2. A polynomial is an element of J 7 (A) whose support is a finite 
subset of A*. Denote the if -valued polynomials based on A by Pk{A) = K{A). 
The degree of a polynomial p is dcg(p) = max{|iy| : p(w) ^ 0} and is — oo if p = 0. 

Definition 4.3. A family {f\ : A 6 A} C J 7 (A) of series is called locally finite if 
for all w G A* there are only finitely many A € A for which f\(w) ^ 0. A series 
/ G J 7 {A) is called proper if /(e) = 0. 

Proposition 4.4. If f E J 7 (A) is proper, then {/" : n — 0,1,2,...} is locally 
finite. 

Proof. If n > \w\, then f n {w) = 0, because 

U\ . ..U n —W 

and at least one Ui is e. □ 
Definition 4.5. If f E J~(A) is proper, define 

oo oo 

/* = / n and / + = f n ( a pointwise finite sum), 

n— n—1 

with /° = 1 = 1 • e = e. 



4.1.2. Rational series and languages. 

Definition 4.6. The rational operations in .F(.A) are sum (+), product (•), multipli- 
cation by real numbers (tw), and *:/—>/*. The family of rational series consists 
of those / G J 7 {A) that can be obtained by starting with a finite set of polynomials 
in J-(A) and applying a finite number of rational operations. 

Definition 4.7. A language £ C A* is rational if and only if its characteristic 
series 



(4.3) F(w) 
is rational. 



1 if w G £ 
if iu £ £ 



Recall that regular languages correspond to regular expressions: The set of reg- 
ular expressions includes A, e, and is closed under +, • , *. A language recog- 
nizable by a finite-state automaton, or consisting of words obtained by reading off 
sequences of edge labels on a finite labeled directed graph, is regular. 

Proposition 4.8. A language £ is rational if and only if it is regular. Thus a 
nonempty insertive and extractive language is rational if and only if it is the lan- 
guage of a sofic subshift. 
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4.1.3. Distance and topology in J- (A). If fi, f 2 G J 7 (A), define 

(4.4) D(f u f 2 ) = inf{n > : there is w G A n such that f x (w) ^ f 2 (w)} 



With respect to the metric d, fk —> f if and only if for each w G A*, fk{w) — > /(w) 
in the discrete topology on R, i.e. fk(w) eventually equals f(w). 

Proposition 4.9. J"~(-4) is complete with respect to the metric d and is a topo- 
logical semiring with respect to the metric d (that is, + and ■ are continuous as 
functions of two variables). 

Definition 4.10. A family {F\ : A G A} of formal series is called summable if there 
is a series F G F(A) such that for every 5 > there is a finite set Ag C A such 
that for each finite set / G A with As G I, d(Y /ieI Fi,F) < 6. Then F is called the 
sum of the series and we write F = J2\eA F\- 

Proposition 4.11. If {F\ : X G A} is locally finite, then it is summable, and 
conversely. 

Thus any F G J 7 (A) can be written as F = X^e./i* F(w)w, where the formal 
series is a convergent infinite series of polynomials in the metric of J 7 (A). Recall 
that 



where F(w)w G F(A) and w G A , so that {F(w)w : w G A } is a locally finite, 
and hence summable, subfamily of IF {A). 

We note here that the set p(A) of all polynomials is dense in J 7 {A). 



4.1.4. Recognizable (linearly representable) series. 

Definition 4.12. F G J 7 (A) is linearly representable if there exists an n > 1 (the 
dimension of the representation) such that there are alxn nonnegative row vector 
x G R™ , annxl nonnegative column vector y G R" , and a morphism of multi- 
plicative monoids <j> : A —x R™ xn (the multiplicative monoid of nonnegative n x n 
matrices) such that for all w G A , F(w) = x<p(w)y (matrix multiplication). A lin- 
early representable measure is one whose associated series is linearly representable. 
The triple (x,<p,y) is called the linear representation of the series (or measure). 



and 




Note that <i(/i, f 2 ) defines an ultrametric on J- (A): 

(4.6) d(f, h) < maxM/, g),d(g, h)} < d(f, g) + d(g, h). 
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Example 4.13. Consider a Bernoulli measure B(po,pi, . . . ,pd-i) on Q,+ {A) = A z+ 
where A — {ao, ai, . . . , ad-i}, and p = (poiPi) • • • ,Pd-i) is a probability vector. 
Let / = Eto Pt°i G - F (" 4 )- Thcn 



Define F p = /* = E„>o / 



if u> = a, 
if w ^ a, . 

Note that / is proper since we have /(e) 



0. Consider 



the particular word w = a2CiQ. Then f (w) = f(w) = 0, and for n > 3, we have 
f n (w) = because any factorization w = U1U2U3 includes e and /(e) = 0. Thus 
F P (w) = f*(w) = P{w) = T,uv=wf( u )f( v ) = /( a 2)/(ao) = P2Po- Continuing in 
this way, we see that for Wi £ A, F p (w\W2 ■ ■ ■ w n ) = p Wl Pw 2 ■ ■ - Pw n - 

Example 4.14. Consider a Markov measure fi on f2 + (A) defined by a d x d stochastic 
matrix P and a d-dimensional probability row vector p = (po,pi, ■ ■ • ,pd-i)- Define 
F P;P G T(A) by F p ,p(wi . . . w n ) = n(C (wi . . . w n )) for all u>i, . . . , w n e A. Put y = 



(i,...,iy 

where 



(4.7) 



p £ R + , and let 4> be generated by </)(%), j = 0, 1, d—1, 



<f>(aj) = 







Oj 



Pi 







for each a , £ A 



\0 ••• /',/ ••• 0/ 

Then the triple (x,4>,y) represents the given Markov measure fj,. In this Markov 
case each matrix 4>(aj) has at most one nonzero column and thus has rank at most 
1. 

Example 4.15. Now we show how to obtain a linear representation of a sofic measure 
that is the image under a 1-block map 7r of a 1-step Markov measure. Let \i be a 
1-step Markov measure determined by a d x d stochastic matrix P and fixed vector 
p as in Example 4.14. Let 7r : X — > Y be a 1-block map from the SFT X to a 
subshift Y. For each a in the alphabet B = A{Y) let P a be the d x d matrix such 
that 



(4.8) 



Pail',]' 




if ttCj') = 
otherwise 



a 



Thus P a just zeroes out all the columns of P except the ones corresponding to 
indices in the 7r-preimage of the symbol a in the alphabet of Y . Again let y = 
(1, . . . , l) tr . For each a £ B define (p(a) = P a . That the ^-measure of each cylinder 
in Y is the sum of the /x-measures of its preimages under it says that the triple 
(x, (j), y) represents v — tt[i. 



In working with linearly representable measures, it is useful to know that the 
nature of the vectors and matrix involved in the representation can be assumed 
to have a particular restricted form. Below, we say a matrix P is a direct sum 
of irreducible stochastic matrices if the index set for the rows and columns of P 
is the disjoint union of sets for which the associated principal submatrices of P 
are irreducible stochastic matrices. (Equivalently, there are irreducible stochastic 
matrices Pi,...,Pk and a permutation matrix Q such that QPQ~ X is the block 
diagonal matrix whose successive diagonal blocks are Pi, . . . ,-Pfc.) 
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Proposition 4.16. A formal series F 6 F(A) corresponds to a linearly repre- 
sentable shift-invariant probability measure (i on Q+(A) if and only if F has a 
linear representation (x, (f>, y) with P = X) a e.A 4>{ a ) a stochastic matrix, y a column 
vector of all 1 's, and xP = x. Moreover, in this case the vector x can be chosen to 
be positive with the matrix P a direct sum of irreducible stochastic matrices. 

Proof. It is straightforward to check that any (x, (f>, y) of the specified form lin- 
early represents a shift-invariant measure. Conversely, given a linear representa- 
tion (x,<p,y) as in Definition 4.12 of a shift-invariant probability measure /i, de- 
fine P = J^aeA^i -) an d n °t e that, by induction, for all w £ A* , fi(Co(wj) = 
x(j)(w)P k y = xP k <j>(w)y for all natural numbers k. 

Next, one shows that it is possible to reduce to a linear representation (x, <j>, y) of 
fj, such that each entry of x and y is nonzero, and, with P defined as P = J2aeA ^( a )' 
xP = x and Py = y. This requires some care. If indices corresponding to entries 
in x or y, or to rows or columns in P, arc jettisoned nonchalantly, the resulting 
new </> may no longer be a morphism. 

Definition 4.17. A triple (x' ',</>' ,y') is obtained from (x,(f>,y) by deleting a set I 
of indices if the following holds: the indices for (x, </>, y) are the disjoint union of 
the set I and the indices for (x' , 4>',y'); and for every symbol a and all indices i,j 
not in / we have x[ = Xi,y[ = yt and 4>'(a)(i,j) = <fi(a)(i, j). Then we let <fi' also 
denote the morphism determined by the map on generators a H- </>'(a). 

First, suppose that j is an index such that column j of P (and therefore column 
j of every </>(a) := M a ) is zero. By shift invariance of the measure, (xP, </>, y) is 
still a representation, so we may assume without loss of generality that Xj = 0. Let 
(x', (f)', y) be obtained from (x, (j), y) by deleting the index j. We claim that (x', 0', y) 
still gives a linear representation of [i. This is because for any word a\ . . . a m , the 
difference [x0(ai) ■ • • <j){a m )y\ — [x'cf)'(ai) ■ ■ ■ 4>'(a rn )y'] is a sum of terms of the form 

(4.9) x{i )M ai (i , i 1 )M a2 (i 1 ,i 2 ) ■ ■ ■ M am (i 

m— 1 ; ^m)y(^m) 

in which at least one index i t equals j. If io — j, then x(io) = 0; if it — j with 
t > 0, then M at (i t -i,it) = 0. In either case, the product is zero. 

By the analogous argument involving y rather than x, we may pass to a new 
representation by deleting the index of any zero row of P. We repeat until we arrive 
at a representation in which no row or column of P is zero. 

An irreducible component of P is a maximal principal submatrix C which is 
an irreducible matrix. C is an initial component if for every index j of a column 
through C, P{i,j) > implies that indexes an entry of C. C is a terminal 

component if for every index i of a row through C, P(i,j) > implies that 
indexes an entry of C. 

Now suppose that X is the index set of an initial irreducible component of P, 
and x(i) = for every i in X. Define (x', <f>' , y) by deleting the index set X. By an 
argument very similar to the argument for deleting the index of a zero column, the 
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triple (x',(j)',y') still gives a linear representation of /x. Similarly, if J is the index 
set of a terminal irreducible component of P, and y(j) = for every j in J , we 
may pass to a new representation by deleting the index set J . 

Iterating these moves, we arrive at a representation for which P has no zero row 
and no zero column; every initial component has an index i with x(i) > 0; and 
every terminal component has an index j with y(j) > 0. We now claim that for 
this representation the set of matrices {P n } is bounded. Suppose not. Then there 
is a pair of indices i,j for which the entries P n (i,j) are unbounded. There is some 
initial component index iq, and some k > 0, such that x(io) > and P k (io,i) > 0. 
Likewise there is a terminal component index jo and an m > such that y(Jo) > 
and P m {j,jo) > 0. Appealing to shift invariance of /j,, for all n > we have 

(4.10) 1 = xP n+k+m y > x(i )P k (i ,i)P n (i,j)P m (j,j )y(j ), 

which is a contradiction to the unboundcdness of the entries P n (i,j). This proves 
the family of matrices P n is bounded. 

Next let Q n be the Cesaro sum, (l/n)(P + ... + P n ). Let Q be a limit of a 
subsequence of the bounded sequence {Q n }- Then PQ = Q = QP; xQ and Qy are 
fixed vectors of P; and (xQ, 4>, Qy) is a linear representation of fi. It could be that 
xQ vanishes on all indices through some initial component, or that Qy vanishes on 
all indices through some terminal component. In this case we simply cycle through 
our reductions until finally arriving a linear representation (x, c/>, y) of /j, such that 
xP = x; Py = y; the set of matrices {P n } is bounded; P has no zero row or column; 
x does not vanish on all indices of any initial component; and y does not vanish on 
all indices of any terminal component. 

If C is an initial component of P, then the restriction of x to the indices of C 
is a nontrivial fixed vector of C. Thus this restriction is positive, and the spectral 
radius of C is at least 1. The spectral radius of C must then be exactly 1, because 
the set {P n } is bounded. 

We are almost done. Suppose P is not the direct sum of irreducible matrices. 
Then there must be an initial component with index set I and a terminal component 
with index set J ^ I, with some i € I, j G J and m minimal in N such that 
P m (i, j) > 0. Because X indexes an initial component, for any k € N we have 
that (xP k )i is the sum of the terms Xi a P(io, i\) ■ ■ ■ P(ik-i, i) such that it G 1, 

< t < k — 1. Because J indexes an terminal component, for any k G N we have 
that {P k y)j is the sum of the terms P(j, i\) ■ ■ ■ P(ik~i,ik)y{ik) such that i t G J, 

1 < t < k. Because 1 ^ J 1 by the minimality of m we have for all n G N that 

n 

(4.11) xy = xP m+n y > ^(zP^P"^, j)(P r ^ k y) J = (n + l)x,P m (z, j) Vj , 

fe=0 

a contradiction. 

Consequently, P is now a direct sum of irreducible matrices, each of which has 
spectral radius 1. The eigenvectors x, y are now positive. Let D be the diagonal ma- 
trix with D(i,i) = y(i). Define (x',(f>',y) = (xD, D^cjyD, D~ 1 y). Then {x',(j}',y) 
is the linear representation satisfying all the conditions of the theorem. 
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□ 

Example 4.18. The conclusion of the Proposition does not follow without the hy- 
pothesis of stationarity: there need not be any linear representation with positive 
vectors x, y, and there need not be any linear representation in which the nonncg- 
ative vectors x, y are fixed vectors of P. For example, consider the nonstationary 
Markov measure fi on two states a, b with initial vector p = (1,0) and transition 
matrix 

If <? is the column vector (1, l) tr , then p, N a , Nf,, q generate a linear representation 
of n, e.g. 1 = n(C (a)) = pN a q, and (l/2) fe - n(C (a k b m )) - p(N a ) k (N b ) m q when 
fc, m > . 

Now suppose that there is a linear representation of pi generated by positive 
vectors x, y and nonnegative matrices M a , Mb- Then 

1 = n(Co(a)) = xM a y, 
( ' ' = n{C Q {b)) - xM b y. 

From the second of these equations, Mb — 0, since x > and j/ > 0. But this 
contradicts < fj,(Ca(ab)) = xM a Mby. 

Next suppose there is a linear representation for which x, y could be chosen 
eigenvectors of P = M a + Mb (necessarily with eigenvalue 1, since xP n y = 1 for all 
n > 0). Then 

(4.14) - = n(C {ab)) - xM a M b y < xPM b y = xM b y = v{C {b)) = 0, 

which is a contradiction. 



4.2. Equivalent characterizations of hidden Markov measures. 



4.2.1. Sofia measures — formal series approach. The semiring J-(A) of formal series 
on the alphabet A is an M. + -module in a natural way. On this module we have a 
(linear) action of A* defined as follows: 

For F g IF (A) and w £ A* , define (w, F) -> w~ 1 F by 
(w~ 1 F)(v) = F(wv) for all v e A*. 

Thus 

w~ 1 F = F(wv)v. 

If F = u G A*, then 



(to 1 _F)(u) = u(wv) 



1 if «ri> = u 
if KID 7^ w. 
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Thus w~ 1 u ^ if and only if u = wv for some v € A*, and then w~ 1 u = v (in the 
sense that they are the same function on A*): w~ 1 v erases w from v if v has w as 
a prefix, otherwise w~ 1 v gives 0. Note also that this is a monoid action : 



Definition 4.19. A submodule M of F(A) is called stable li w l F e AI for all 

F e AI, i.e. w~ 1 AI C M, for all w E A*. 

Theorem 4.20. Let A be a finite alphabet. For a formal series F £ J-t^ + (A) that 
corresponds to a shift-invariant probability measure v in Q + (A), the following are 
equivalent: 

(1) F is linearly representable. 

(2) F is a member of a stable finitely generated submodule of F^ + (A). 

(3) F is rational. 

(4) The measure v is the image under a 1-block map of a shift-invariant 1-step 
Markov probability measure \x. 

In the latter case, the measure v is ergodic if and only if it is possible to choose \i 
ergodic. 

In the next few sections we sketch the proof of this theorem 

4.2.2. Proof that a series is linearly representable if and only if it is a member of 
a stable finitely generated submodule of J~(A) . Suppose that F is linearly repre- 
sentable by (x,cf),y). For each i = 1,2, ••■ ,n (where n is the dimension of the 
representation) and each w € A*, define 



Let M = • • • , F n ) be the span of the with coefficients in R + , which is a 
submodule of J 7 ^). Since 



(4.15) 



{vw)~ l F = w- 1 (v~ 1 F) . 



Fi(w) = [cj)(w)y]i. 



n n 



F(w) = x<p{w)y = ^2 Xi[(t>{w)y]i = ^2 XiFi(w), 



we have that F = ^ 



"=i x iFii which means F € AL 



We next show that M is stable. Let w G A*. Then for iiei*, 
(w~ 1 F i )(u) = Fi(wu) = [4>{wu)y]i = [<f)(w)4>(u)y]i 



■n 



n 




3=1 



3 = 1 



Since 4>{w)ij G K + , we have Y?j=i < f>i w )ijFj( u ) £ M, so 



to- 1 Fi = y £,x i 4>{w) ij Fj e {F u ...F n ) = AI. 



3 = 1 
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Conversely, let M be a stable finitely generated left submodule, and assume that 
F S (Fi, • • • ,F n ) = M. Then there are xi, ■ ■ ■ , x n G R+ such that F = J27=i x i F i- 
Since M is stable, for each a G A and each i = 1, 2, ■ • • , n, we have that a~ 1 Fi G 
(Fi, ...F n ). So there exist dj G K+, j = 1,2, •• • ,n, such that a~ 1 Fi = Y^j=i 
Define 4>{a)ij = c,j for i, j = 1, 2, • ■ • , n. Note by linearity that for any nonncgative 
row vector (t\, . . . , t n ) we have 

n n 

(4.16) a" 1 ^ t,F 2 ) = ]T ((*!, • ■ . 



i=l 3=1 



Extend to a monoid morphism : A* — >• M™ x " by defining <f){ai---a n ) = 
0(ai) • ■ • 4>(a n ). Because the action of A* on .F(.A) satisfies the monoidal condi- 
tion (4.15), we have from (4.16) that for any w = a\02 ■ ■ ■ a n G A* , 



w 



4 — 1 4 — 1 i—1 

= J2 ((*!»•••» *nM<»i) • • • #On)) Fj ;=E ((*i,---,*n)^W) ■ 

Define the column vector y by yj = Fj(l) for j = 1, 2, • • • , n and let x be the row 
vector (xi, . . . , x n ). Then 
(4.17) 



F(w) = to-^Cl) =(^£ (^W)/iJ (!) = E (^H)/.(l) 
showing that (x, </>, y) is a linear representation for F. 



x<j)(w)y 



4.2.3. Proof that a formal series is linearly representable if and only if it is rational. 
This equivalence is from [59, 89]. Recall that a series is rational if and only if it 
is in the closure of the polynomials under the rational operations + (union), • 
(concatenation), *, and multiplication by elements of R+. 

First we prove by a series of steps that every rational series F is linearly repre- 
sentable. 

Proposition 4.21. Every polynomial is linearly representable. 



Proof. If w G A and \w\ is greater than the degree of the polynomial F, then 
ur 1 = 0. Let S — {w _1 F : w G A*}. Then S is finite and stable, hence S spans a 
finitely generated stable submodule M to which F belongs. (Take e~ 1 F = F). By 
Section 4.2.2, F is linearly representable. □ 



The next observation follows immediately from the definition of stability. The 
proof of the Lemma is included for practice. 

Proposition 4.22. If F\ and F2 are in stable finitely generated submodules of F (A) 
and t G K+, then {F\ + F2) and (tF\) are in stable finitely generated submodules of 
F{A). 
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Lemma 4.23. For F,G £ T(A) and a £ A, aT x {FG) = { a - 1 F)G + F{e) a - 1 G. 

Proof. For any w £ A*, 

(a- 1 (FG))(w) = (FG)(aw)= £ F(u)G(v) 



uv—aw 



= F(e)G(aw)+ ^ F(au')G{v') 
(4.18) u'v'=w 

= F(e)G(aw)+ ^ (a' 1 F)(u')G(v') 



u'v' —w 



^( £ )( a - 1 G)M + ((a- 1 ^)(G))M. 



□ 



Proposition 4.24. Suppose that for i = 1, 2, Fi £ Mi, where each Mi is a stable, 
finitely generated submodule. Let M = MiF 2 + M 2 . TTien M is finitely generated 
and stable and contains F1F2. 

Proof. The facts that F1F2 £ M and M is finitely generated are immediate. The 
proof that M is stable is a consequence of the Lemma. For if /iF 2 + / 2 is an element 
of M and a £ A, then 

(4.19) a-\hF 2 + f 2 ) = (a- 1 f 1 )F 2 + / 1 (e)( a - 1 F 2 ) + a" 1 / 2 . 

Note that a" 1 /! € Mi and a _1 /2, a -1 -^ £ M 2 . Thus /i(e)( a - 1 F 2 ) + / 2 £ M 2 , so 
we conclude that M is stable. 

□ 

Lemma 4.25. If F is proper (that is Fi(e) = 0) and a £ A, then a _1 (F*) = 
(a- 1 F)F*. 

Proof. Recall that F* = £„ >0 F™. Thus a^F*) = a" 1 ^ + FF*) = a" 1 ^ + 
FF*) = a- x e + ( a - 1 F)F* + F^a^F*). 

Because (a~ 1 e)(w) = e(aw) = for all iv £ A* and F(e) = 0, we get that 
a^F* = (a~ 1 F)F*. □ 

Proposition 4.26. Suppose Mi is finitely generated and stable, and that F\ £ Mi 
is proper. Then F* is in a finitely generated stable submodule. 

Proof. Define M = K+ +M1F*. We have 

F* = 1 + ^ F™ = (1 + FiF*) € M. 

n>l 

Also M is finitely generated (by 1 and the /iF* if the generate Mi). 

To show that M is stable, suppose that t £ K + and a £ A Then for any it £ A* 
we have (a _1 i)(u) = t(au) = 0, so a _1 i = £ M+. And for any f\ £ Mi and a £ A, 
a _1 (/iFi*) = (a-V^F* + /i(e)a- 1 (Fi*). Since Mi is stable, a _1 /i G Mi and the 
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first term is in M±Ff . By the Lemma, the second term is /i(e)(a _1 i 7 i)F 1 *, which 
is again in MiF-j*. □ 



These observations show that if F is rational, then F lies in a finitely generated 
stable submodule, so by Section 4.2.2 F is linearly representable. 

Now wc turn our attention to proving the statement in the title of this section 
in the other direction. So assume that F G J- (A) is linearly representable. Then 
F(w) = x<f>(w)y for all w S A for some (x,(f),y). Consider the semiring of formal 
series !Fk{A) — K A , where K is the semiring R" xn of n x n nonnegative real 
matrices and n is the dimension of the representation. Let D = X) a e.4 4>{ a ) a £ 
Fx (A). The series D is proper, so we can form 

(4.20) ^* = E^ = E(E h = E ( E = E ^ w ) w - 

h>0 h>0 a£A h>0 weA h weA 

This series D* is a rational element of Tk (A), since we started with a polynomial 
and formed its *. By Lemma 4.27 below, each entry (D*)ij is rational in J-m + (A). 

With D and D* now defined, we have that 

(4.21) F(w) = x<t>(w)y = ^x i (t>{w) lJ y : j ^^2x i D*(w)i :j y j , 

and each D*(w)ij is a rational series applied to w. Thus F(w) is a finite linear 
combination of rational series D*j applied to w and hence is rational. 

Lemma 4.27. Suppose D is an n x n matrix whose entries are proper rational 
formal series (e.g., polynomials). Then the entries of D* are also rational. 



Proof. We use induction on n. The case n = 1 is trivial. Suppose the lemma holds 

for n — 1, and D is n x n with block form D = , with a a rational series. 

The entries of D can be thought of as labels on a directed graph; a path in the 
graph has a label which is the product of the labels of its edges; and then D*(i,j) 
represents the sum of the labels of all paths from i to j (interpret the term "1" 
in D(i,i) as the label of a path of length zero). With this view, one can see that 

D* = I ™ ] , where 



(1) b = (a + uY*v)* , 

(2) Z = (Y + va*u)* , 

(3) w = buY* , 

(4) x = Y*vb . 



Now Y* and Z have rational entries by the induction hypothesis, and consequently 
all entries of D* are rational. □ 
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4.2.4. Linearly representable series correspond to sofic measures. The (topological) 
support of a measure is the smallest closed set of full measure. Recall our convention 
(Sec. 2.4) that Markov and sofic measures are ergodic with full support. 

Theorem 4.28 [39, 46, 48]. A shift-invariant probability measure v on Q+(A) 
corresponds to a linearly representable (equivalently, rational) formal series F = 
F v G J-r + (A) if and only if it is a convex combination of measures which ( restricted 
to their supports) are sofic measures. Moreover, if(x,<f>,y) is a representation of 
F v such that x and y are positive and the matrix X^es "K') * s irreducible, then v 
is a sofic measure. 

Proof. Suppose that v is the image under a 1-block map (determined by a map 
7r : A — > B between the alphabets) of a 1-step Markov measure /i. Then v is 
linearly representable by the construction in Example 4.15. 

Alternatively, if Fp is represented by (x, <j>, y) then for each w G A* we have 
(4.22) F^w) =y2x i 0( W ) ij y j = J>*(E ^<( w ' 

i,j i,j a£A 

For u G B* define 



.Vr 



F„(u) = Y,Xi([YK E fa) 

i,j \ beB a£AMa)=b 



(4.23) F v {u) =^i[\2^( 2. ^ Vi 

,4>(a)=b / ij 

to see that F v is a linear combination of rational scries and to see its linear repre- 
sentation. 



Conversely, suppose that v corresponds to a rational (and hence linearly repre- 
sentable) formal series F = F v G J~r + (B) with dimension n. Let {x, <f>, y) represent 
F. To indicate an ordering of the alphabet B, we use notation B = {1, 2, . . . , k} 
and 4>{i) = Pi. First assume that the n x n matrix P is irreducible and the vectors 
x and y are positive. We will construct a Markov measure jj, and a 1-block map tt 
such that v = nfi. 



Applying the standard stochasticization trick as in the last paragraph of the proof 
of Proposition 4.16, we may assume that the irreducible matrix P is stochastic, 
every entry of y is 1, and x is stochastic. Define matrices with block forms, 







P2 ■ 








M = 


Pi 


P 2 ■ 


■ p k 


R = 


I 






P2 ■ 


■ Pk) 







C = (Pi P 2 



Pk) 



Mi 



( • 


• Pi ■■ 


• °\ 


• 


■ Pi ■■ 


■ 


U • 


■ Pi ■■ 


• «/ 



where each Pj is n x n; R is nk x k; I is the n x n identity matrix; C and the Mi 
are nk x nk; and Mj is zero except in the i'th block column, where it is RPi- The 
matrix M is stochastic, but it can have zero columns. (We thank Uijin Jung for 
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pointing this out.) Let M' be the largest principal submatrix of M with no zero 
column or row. 

We have a strong shift equivalence M — RC, P = CR, and it then follows 
from the irreducibility of P that M' is irreducible. Therefore, there is a unique left 
stochastic fixed vector X for M. Let Y be the nk x 1 column vector with every 
entry 1. We have MR = RP, and consequently XR = x. Also, MiR — RPi for 
each i. So, for any word i\ • ■ ■ ij, we have 

xP il ---P ij y = XRP il .--P ij y 

= XM n ■ ■ ■ M^Ry = XM n ■ ■ ■ M i} Y . 

This shows that (X, $, Y) is also a representation of F v , where = Mj. Let 
X',<S>'(i) = M[,Y' be the restrictions of X, $(i),Y to the vectors/matrices on the 
indices of M'. Then (X 1 , <&',Y') is also a representation of F„. Let A' be the 
0, 1 matrix of size matching M' whose zero entries are the same as for M'. Then 
(X',M',Y') defines an ergodic Markov measure \i on £Ia' and there is a 1-block 
code 7r such that nfx = v. Explicitly, 7r is the restriction of the code which sends 
{1, 2, . . .n} to 1; {n + 1, n + 2, . . . 2n} to 2; and so on. Thus v is a sofic measure. 

Now, for the representation (x,<p,y) of F„, we drop the assumption that the 
matrix P is irreducible. However, by Proposition 4.16, without loss of generality 
we may assume that P is the direct sum of irreducible stochastic matrices PV'; 
a; is a positive stochastic left fixed vector of P; and y is the column vector with 
every entry 1. Restricted to the indices through P"', a; is a fixed vector of P^ 
and therefore is a multiple cjx^ of the stochastic left fixed vector x^ of P«'l. 
Note, . Cj = 1. If j/W denotes the column vector with every entry 1 such that 
pt?) y (i) = y 0") 7 then 

j 

If follows from the irreducible case that \x is a convex combination of sofic measures. 

□ 

4.3. Sofic measures — Furstenberg's approach. Below we are extracting from 
[39, Sees. 18-19] only what we need to describe Furstenberg's approach to the 
identification of sofic measures and compare it to the others. This leaves out a lot. 
We follow Furstenberg's notation, apart from change of symbols, except that we 
refer to shift-invariant measures as well as finite-state stationary processses. 

Furstcnberg begins with the following definition. 

Definition 4.29. [39, Definition 18.1] A stochastic semigroup of order r is a semi- 
group S having an identity e (i.e., a monoid), together with a set of r elements 
A = {oi, (22, . . . , a r } generating S, and a real- valued function F defined on S 
satisfying 

(1) F(e) = 1, 

(2) F(s) > for each s e S and F{ ai ) > 0, i = 1, 2, . . . , r , 

(3) Y, r l=1 F{a iS ) = Sr =1 F(sai) = F(s) for each s e S. 
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Given a subshift X on an alphabet {ai, <J2, • • • , <v} with shift-invariant Borel 
probability fj. and /Lt(ai) > for every i, let S be the free semigroup of all formal 
products of the a,, with the empty product taken as the identity e. Define F 
on S by F(e) = 1 and Fia^a^ . ..a, fc ) = ^((^(a^a^ . . • a^))- Clearly the triple 
({di, 02, ■ ■ • , a r }, 5, i* 1 ) is a stochastic semigroup, which we denote S(X). 

Conversely, any stochastic semigroup ({ai, a2, . . . , a r }, £>, F) determines a unique 
shift-invariant Borel probability /1 for which Fia^a^ . . .a,i k ) = //(^((a^a^ . . .a.i k ))) 
for all ai t ai 2 . . . a.; fe . We denote by X(S) this finite-state stationary process (equiv- 
alcntly the full shift on r symbols with invariant measure (i). Two stochastic semi- 
groups are called equivalent if they define the same finite-state stationary process 
modulo a bijection of their alphabets. A cone in a linear space is a subset closed 
under addition and multiplication by positive real numbers [39, Sec. 15.1]. 

Definition 4.30. [39, Definition 19.1] Let D be a linear space, D* its dual, and 
let C be a cone in D such that for all x, y in D, if x + Xy <G C for all real A, then 
y = 0. Let 6 G C and 6* € D* , and suppose that 8* is nonnegative on C. A linear 
stochastic semigroup S on (0,8,9*) is a stochastic semigroup ({ai, . . . , a r }, S, F) 
whose elements arc linear transformations from C to C satisfying 

(1) EaiO = 0; 

(2) ~^2,a*8* = 8* (where L* denotes the transformation of D* adjoint to a 
transformation L of D) ; 

(3) F(s) = (8*,s8) for s E S, where (•, •) denotes the dual pairing of D* and 
D; 

(4) (0*,Oi0)>O, z = l,2,...,r. 

(<!?, D,C,8,8*) was called finite dimensional by Furstenberg if there is m £ N such 
that D = R m , C is the cone of vectors in R m with all entries nonnegative, and each 
element of 5* is an m x m matrix with nonnegative entries. 

A semigroup S of transformations satisfying (1) to (4) does define a stochastic 
semigroup if (8* ,8) = 1. 

Theorem 4.31. [39, Theorem 19.1] Every stochastic semigroup S is equivalent to 
some linear stochastic semigroup. 

Proof. Let A (S) be the real semigroup algebra of S, i.e., the real vector space with 
basis S and multiplication determined by the semigroup multiplication in S and 
the distributive property, 

(4.24) ( Y, ass) ( Y A*) = E a ^ sL 

(Each sum above has finitely many terms.) 

If S is the free monoid generated by r symbols, then Aq(S) is isomorphic to the 
set pm(A) of real- valued polynomials, i.e. finitely supported formal series A* — > K 
(see Definition 4.2). 
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Extend F from S to a linear functional on Aq(S), i.e. F(^a s s) = ^a s F(s). 
Define / = {u e Aq(S) : F (it) = 0}, an ideal in Ao(5), and the algebra ^4 = 
A(S) = Aq(S)/I. Define the element r = a\ + a-i + ■ ■ ■ + a r in A(S) (here <Zj 
abbreviates aj + /) and set £> = A/^4(e — r). 

The elements of .4 and in particular those of S operate on D by left multiplica- 
tion. Let a! i denote the operator induced by left multiplication by a; <= S. Take V 
to be the image in D of the set of elements of A that can be represented as positive 
linear combinations of elements in S. Denote by u the image in D of an element u 
in A. Set 9 = e and let 9* be the functional induced on D by F on A (F vanishes 
on A(e — t)). 

Then the four conditions in the definition of linear stochastic semigroup are 
satisfied. This linear stochastic semigroup given by 

(4.25) ({a[,...,a' r },D,V,e,e*) 

is equivalent to the given S because F(s') = (9* , s'ff) = F(s). (We will see later that 
this construction is closely related to Heller's "stochastic module" construction.) 

□ 



Given a shift-invariant sofic measure on the set of two-sided sequences on the 
alphabet {1, . . . , r} which assigns positive measure to each symbol, it is possible to 
associate an explicit finite-dimensional linear stochastic semigroup to fj, in the same 
way that we attached a linear representation in Example 4.15. Here fi is the image 
under some 1-block code tt of a Markov measure defined from some m x m stochastic 
matrix P. For 1 < i < r, let Pj be the to x m matrix such that Pi(i',f) = P(i',j') 
if Tr(j') = i and otherwise Pi(i',j r ) = 0. Let 8* be a stochastic (probability) left 
fixed vector for P and let 8 be the column vector with every entry 1. Let C be the 
cone of all nonncgative vectors in D — R m . If we identify Pi with the symbol i, 
then these data give a finite-dimensional linear stochastic semigroup equivalent to 
S(X). Along with this observation, Furstenberg established the converse. 

Theorem 4.32. [39, Theorem 19.2] A linear stochastic semigroup S is finite di- 
mensional if and only if the stochastic process that it determines is a 1-block factor 
of a 1-step stationary finite-state Markov process. 

In the statement of Theorem 4.32, "Markov" does not presume ergodic. The 
construction for the theorem is essentially the one given in Theorem 4.28, with a 
simplification. Because of the definition of linear stochastic semigroup (Definition 
4.30), Furstenberg can begin with 8*, 8 actual fixed vectors of P := J^-Pj- The 
triple (P,9*,9) corresponds to (JP,x,y) in Theorem 4.16, where x,y need not be 
fixed vectors. Thus Furstenberg can reduce more quickly to the form where 9* and 
9 are positive fixed vectors of P. Note that "finite dimensional" in Theorem 4.32 
means more than having the cone C of the linear stochastic semigroup generating 
a finite-dimensional space D: here C is a cone in R m with exactly m (in particular, 
finitely many) extreme rays. 
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4.4. Sofic measures — Heller's approach. Repeating some problems already 
stated, but with some refinements, here are the natural questions about sofic mea- 
sures which we are currently discussing, in subshift language. 

Problem 4.33. Let ir : CI a —> Y be a 1-block map from a shift of finite type to 
a (sofic) subshift and let (i be a (fully supported) 1-step Markov measure on CI a- 
When is irfj, Markov? Can one determine what the order (a k such that the measure 
is fc-step Markov) of the image measure might be? 

Problem 4.34. Given a shift-invariant probability measure v on a subshift Y, 
when are there a shift of finite type CI a, a factor map tt : CI a Y, and a 1-step 
shift-invariant fully supported Markov measure [i on CI a such that nfi = vl 

Problem 4.35. If v is a sofic measure, how can one explicitly construct Markov 
measures of which v is a factor? Are there procedures for constructing Markov 
measures that map to v which have a minimal number of states or minimal entropy? 

Problem 4.33 was discussed in [20], for the reversible case. Later complete solu- 
tions depend on Heller's solution of Problem 4.34, so we discuss that first. Effective 
answers to the first part of Problem 4.35 are given by Furstenberg and in the proof 
of Theorem 4.28. 

Problem 4.34 goes back at least to a 1959 paper of Gilbert [40]. Following 
Gilbert and Dharmadhikari [23, 24, 25, 26], Heller (1965) created his stochastic 
module theory and within this gave a characterization [48, 49] of sofic measures 
(1965). We describe this next. 

4.4.1. Stochastic module. We describe the stochastic module machinery setup of 

Heller [48] (with some- differences in notation). Let S = {1.2 s} be a finite 

state space for a stochastic process. Let As be the associative real algebra with 
free generating set S. An As -module is a real vector space V on which As acts 
by linear transformations, such that for each i G S there is a linear transformation 
Mi : V ->• V such that a word ui...Uk sends v G V to M Ul (M u . 2 (...(M Uk {v))..). We 
denote an A^-module as ({Mi}, V) or for brevity just {Mi}, where the Mj are the 
associated generating linear transformations V — > V as above. 

Definition 4.36. A stochastic S-module for a stochastic process with state space 
S is a triple (I, {Mi},r), where ({Mj}, V) is an Ag-module, r e V, I 6 V* , and for 
every word u = u\...Ut on S its probability Prob(u) = Prob(Co(u)) is given by 

(4.26) Prob(u) = lM Ul M U2 ...M Ut r. 

Given an A^-modulc M, an I G V* and r G V, a few axioms are required to 
guarantee that they define a stochastic process with state space S. Define a = 
^2{a,i : Oi G S} and denote by Cs the cone of polynomials in As with nonnegative 
coefficients. Then the axioms arc that 



(1) lr = l; 

(2) l{C s r) C [0,oo); 

(3) forall/e4sr,K/(*-l)r)=0. 
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Example 4.37. A stochastic module for a sofic measure. As we saw in Section 4.3, 
this setup of a stochastic module arises naturally when a 1-block map tt is applied to 
a 1-step Markov measure /i with state space S given by an s x s stochastic transition 
matrix P and row probability vector I. For each i £ S, let Mi be the matrix whose 
j'th column equals column j of P if 7r(j) = i and whose other columns are zero. 
The probability of an 5*-word u = ui...u t is lM Ul M U2 ...M Ut r, where r is the vector 
of all l's. With V — R s , presented as column vectors, (l,{Mi},r) is a stochastic 
module for the process given by nfx. 

4.4.2. The reduced stochastic module. A stochastic module (I, ({Mi}, V), r) is re- 
duced if (i) V is the smallest invariant (under the operators Mj) vector space con- 
taining r and (ii) I annihilates no nonzero invariant subspace of V. Given a sto- 
chastic module (I, {Mi}, r) for a stochastic process, with its operators Mi operating 
on the real vector space V, a smallest stochastic module (l',{Ml},r') describing 
the stochastic process may be defined as follows. Let R\ be the cyclic submodulc 
of V generated by the action on r; let L\ be the cyclic submodule of V* generated 
by the (dual) action on I; let V be R\ modulo the subspace annihilated by L\\ 
for each i £ S let M[ be the (well defined) transformation of V induced by Mjj 
let r',l' be the elements of V and {V')- 1 determined by r, I. Now (l',M',r') is 
the reduced stochastic module of the process. V is the subspace generated by the 
action of the M[ on r', and no nontrivial submodule of V is annihilated by V . The 
reduced stochastic module is still a stochastic module for the original stochastic 
process. We say "the" reduced stochastic module because any stochastic modules 
describing the same stochastic process have isomorphic reduced stochastic modules. 

4.4.3. Heller's answer to Problem 4-34- We give some preliminary notation. A 
process is "induced from a Markov chain" if its states are lumpings of states of 
a finite state Markov process, that is, there is a 1-block code which sends the 
associated Markov measure to the measure associated to the stochastic process. 
Let (As)+ be the subset of As consisting of linear combinations of words with all 
coefficients nonnegative. A cone in a real vector space V is a union of rays from the 
origin. A convex cone C is strongly convex if it contains no line through the origin. 
It is polyhedral if it is the convex hull of finitely many rays. 

Theorem 4.38. Let (I, ({Mi}, V), r) be a reduced stochastic module. The associated 
stochastic process is induced from a Markov chain if and only if there is a cone C 
contained in the vector space V such that the following hold: 

(1) reC, 

(2) IC C [0,oo), 

(3) (A s )+C c C, 

(4) C is strongly convex and polyhedral. 

Heller stated this result in [48, Theorem 1]. The proof there contained a mi- 
nor error which was corrected in [49]. Heller defined a process to be finitary if 
its associated reduced stochastic module is finite dimensional. (We will call the 
corresponding measure finitary.) A consequence of Theorem 4.38 is the (obvious) 
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fact that the reduced stochastic module of a sofic measure must be finitary. Heller 
gave an example [48] of a finitary process which is not a 1-block factor of a 1-step 
Markov measure, and therefore is not a factor of any Markov measure. (However, a 
subshift with a weakly mixing finitary measure is measure theoretically isomorphic 
to a Bernoulli shift [12].) 

4.5. Linear automata and the reduced stochastic module for a finitary 
measure. The 1960's and 1970's saw the development of the theory of probabilis- 
tic automata and linear automata. We have not thoroughly reviewed this literature, 
and we may be missing from it significant points of contact with and independent 
invention of the ideas under review. However, we mention at least one. A finite 
dimensional stochastic module is a special case of a linear space automaton, as 
developed in [51] by Inagaki, Fukumura and Matuura, following earlier work on 
probabilistic automata (e.g. [76, 83]. They associated to each linear space automa- 
ton its canonical (up to isomorphism) equivalent irreducible linear space automaton. 
When the linear space automaton is a stochastic module, its irreducible linear space 
automaton corresponds exactly to Heller's canonical (up to isomorphism) reduced 
stochastic module. Following [51] and Nasu's paper [70], we will give some concrete 
results on the reduced stochastic module. 

We continue the Example 4.37 and produce a concrete version of the reduced 
stochastic module in the case that a measure on a subshift is presented by a sto- 
chastic module which is finite dimensional as a real vector space (for example, in 
the case of a sofic measure). Our presentation follows a construction of Nasu [70] 
(another is in [51]). Correspondingly, in this section we will reverse Heller's roles 
for row and column vectors and regard the stochastic module as generated by row 
vectors. 

So, let (u, {Mi},v) be a finite dimensional stochastic module on finite alphabet 
A. We take the presentation so that there is a positive integer n such that the Mi 
are nxn matrices; u and v are ro-dimensional row and column vectors; and the map 
a I— > M a induces a monoid homomorphism </> from A* , sending a word w = a\ ■ ■ ■ a,j 
to the matrix <fi(w) = M ai ■ ■ ■ M aj . 

Let U be the vector space generated by vectors of the form u<j>{w), w e A*. 
Similarly define V as the vector space generated by vectors of the form <p(w)v, 
w G A*. Let k = dim(ZY). If k < n, then construct a smaller module (presenting 
the same measure) as follows. Let L be a k x n matrix whose rows form a basis 
of U. For each symbol a there exists a k x k matrix M a such that LM a = M a L. 
Define u to be the k dimensional row vector such that uL = u and set v = Lv. Let 
a — > M a induce a monoid homomorphism </> from A* , sending a word w = a\ ■ ■ ■ a,j 
to 4>(w) = M ai ■ ■ ■ M aj . The subspace U of R fc generated by vectors of the form 
ucf>(w) is equal to M. k because UL = U and dim(iY) = k. It is easily checked that 
ucf>(w)v = u<f>(w)v, for every w in A*. Let V be the subspace of R fc generated by 
column vectors <f>(w)v. We have for each a that LM a v = M a Lv = M a v, so L maps 
V onto V. Also L maps the space of n-dimensional column vectors onto R fe . It 
follows that if dim(V) = n, then dim(V) = k. 
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If dim(V) < k, then repeat the reduction move, but applying it to v (column 
vectors) rather than to u. This will give a stochastic module (u, {M a },v), say with 
to x m matrices M a and invariant subspaces U, V generated by the action on u, v. 
By construction we have dim(V) = m. And because U had full dimension, we have 
dim(W) = to also. Regarding V as a space of functionals on U, and letting ker(V) 
denote the subspace of IA annihilated by all elements of V, we see that u h-> u is 
a presentation of the map tt : U — > U/kev(V). Thus (u, {M a },v) is a presentation 
of the reduced stochastic module. Also, for all a, irM a = M a ir, and therefore the 
surjection ir (acting from the right) also satisfies 

(4.27) (E M ^ = 7r (E lg -) • 

a a 

If (u, {M a },v) is another such presentation of the reduced stochastic module, then 
it must have the same (minimal) dimension m, and there will be an invcrtible 
matrix G (giving the isomorphism of the two presentations) such that for all a, 

(4.28) (u, {M a }, vj = (uG, {G-W a G}, G^Vj . 

To find G, simply take to words w such that the vectors u4>{w) are a basis for U, 
and let G be the matrix such that for each of these w, 

(4.29) Tkj)G = u4> . 

The rows of the matrix L above (a basis for the space U) may be obtained by 
examining vectors u<j)(w) in some order, with the length of w nondecreasing, and 
including as a row any vector not in the span of previous vectors. Let U m denote 
the space spanned by vectors u(j)(w) with w of length at most to. If for some to it 
holds that U m = U m +i, then U m = U. In particular, if n is the dimension of the 
original stochastic module, then the matrix L can be found by considering words 
of length at most n — 1. 

One can check that if two equivalent stochastic modules have dimensions ni and 
7T-2, then they are equivalent (define the same measure) if and only if they assign the 
same measure to words of length ni + 712 — 1. (This is a special case of [51, Theorem 
5.2].) If the reduced stochastic module of a measure has dimension at most n, then 
one can also construct the reduced stochastic module from the measures of words 
of length at most 2n — 1 (one construction is given in [51, Theorem 6.2]). However, 
without additional information about the measure, this forces the examination of 
a number of words which for a fixed alphabet can grow exponentially as a function 
of n, as indicated by the following example. 

Example 4.39. Let X be the full shift on the three symbols 0, 1,2. Given k G N, 
define a stochastic matrix P indexed by A-words of length k + 1 by P(10 fe , fc l) = 
1/6 = P(20 k , fc 2); P(10 fc , k 2) = 1/2 = P(20 fe , fe l); P(a •••a k ,a 1 --- a k+1 ) = 1/3 
otherwise; and all other entries of P are zero. This matrix defines a (k + l)-step 
Markov measure /i on A which agrees with the Bernoulli (1/3, 1/3, 1/3) measure 
on all words of length at most k + 2 except the four words 10 fe l, 10 fc 2, 20 fc l, 10 fc 2. 
The reduced stochastic module has dimension at most 2k + 1, because for any word 
U the conditional probabilty function on A-words defined by pu : W *-> fx(UW\U) 
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will be a constant multiple of pv for one of the words V = +1 , 10 3 , 20 J , with 
< j < k. The number of X- words of length k + 2 is 3 fc+2 . 

4.6. Topological factors of finitary measures, and Nasu's core matrix. The 

content of this section is essentially taken from Nasu's paper [70] , as we explain in 
more detail below. Given a square matrix M, in this section we let M* denote 
any square matrix similar to one giving the action of M on the maximal invariant 
subspace on which the action of M is nonsingular. 

Adapting terminology from [70] , we define the core matrix of a finite dimensional 
stochastic module give by matrices, (I, {Mi}, r), to be J^i Mi- A core matrix for a 
finitary measure fi is any matrix which is the core matrix of a reduced stochastic 
module for [i. This matrix is well defined only up to similarity, but for simplicity 
of language we refer to the core matrix of n, denoted Core(u). Similarly, we define 
the eventual core matrix of [i to be Core(/i)' , denoted Core*(/i). E.g., if Core(/x) is 

(\ °\ 

10 „ *, s . (\ s 

1 
\0 0/ 



then Core*(/x) is [ 2 ^ 



Considering square matrices M and N as linear endomorphisms, we say N is a 
quotient of M if there is a linear surjection it such that, writing action from the 
right, Mir = ttN. (Equivalcntly, by duality, the action of N is isomorphic to the 
action of M on some invariant subspace.) In this case, the characteristic polynomial 

of AI divides that of N (but, e.g. ^ is a principal submatrix of but not a 

/2 1 

quotient of 2 
\0 

Theorem 4.40. Suppose <j) is a continuous factor map from a subshift X onto a 
subshift Y , pi £ M.(X) and <pp = v 6 A4(Y). Suppose fi is finitary. Then v is 
finitary, and Core* (v) is a quotient of Core* (fi). In particular, if <p is a topological 
conjugacy, then Core* (v) = Core* (\x). 

The key to the topological invariancc in Theorem 4.40 is the following lemma (a 
measure version of [70, Lemma 5.2]). 

Lemma 4.41. Suppose fx is a finitary measure on a subshift X and n G N. Let 
X<- n ' be the n-block presentation of X; let ip : X^ — > X the 1-block factor map 
defined on symbols by [a\ ■ ■ ■ a n ] h- y a\; let /j,^ G M.{X^ n ') be the measure such that 
■0^N = fx. Then p} n ^ is finitary and Core* (/j,^ ) is a quotient of Core* (n). 

Proof of Lemma 4-41- For n > 1, the n-block presentation of X is (after a renaming 
of the alphabet) equal to the 2-block presentation of X^ n ~^. So, by induction it 
suffices to prove the lemma for n = 2. 



Let (I, {Pi}, r) be a reduced stochastic module for fj,, where the Pi arc k x k and 
A(X) = {1, 2, . . . , m}. For each symbol ij of A(X^), define an mk x mk matrix 
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P/- as an m x m system of k x k blocks, in which the i,j block is Pi and the other 



entries are zero. Define V = (I, . . . , I) (m copies of I) and define r' 



[Pn 



Then 



(V , {P[j} , r r ) is a stochastic module for /J- 2 \ which is therefore finitary. Also, we 
have an elementary strong shift equivalence of the core matrices P and P', 

(Pi 

(I •■• I) , P=(I ■■■ I) 



P 



P' = 



,P„ 



\Pn 



and therefore P* = (P')*. Because Core(^J 2 l) is a quotient of P', it follows that 
Core* (fiM ) i s a quotient of (P')* = P* = Core*(^). □ 



If 4> ■ X — >• Y is a factor map of irreducible sofic shifts of equal entropy, then 
4> must send the unique measure of maximal entropy of X, fix, to that for Y. 
These are sofic measures, and consequently Theorem 4.40 gives computable ob- 
structions to the existence of such a factor map between given X and Y. In his 
work, Nasu associated to given X a certain linear (not stochastic) automaton. If 
we denote it (I, {Mi},r), and let log(A) denote the topological entropy of X, then 
(I, {(l/A)Mj}, r) would be a stochastic module for fix- In the end Nasu's core 
matrix is ACorc(^x). Nasu remarked in [70] that his arguments could as well be 
carried out with respect to measures to obtain his results, and that is what we have 
done here. 

Eigenvalue relations between core matrices (not so named) of equivalent lin- 
ear automata already appear in [51, Sec. 7]. Also, Kitchens [55] earlier used the 
(Markov) measure of maximal entropy for an irreducible shift of finite type in a 
similar way to show that the existence of a factor map of equal-entropy irreducible 
SFTs, £Ia — > ^b, implies (in our terminology) that B* is a quotient of A*. This is 
a special case of Nasu's constraint. 



5. When is a sofic measure Markov? 



5.1. When is the image of a 1-step Markov measure under a 1-block 
map 1-step Markov? We return to considering Problem 4.33. In this subsection, 
suppose /i is a 1-step Markov measure, that is, a 1-step fully supported shift- 
invariant Markov measure on an irreducible shift of finite type Ha- Suppose that 
7r is a 1-block code with domain ft a- How does one characterize the case when the 
measure ~K[i is again 1-stcp Markov? 

To our knowledge, this problem was introduced, in the language of Markov pro- 
cesses, by Burke and Rosenblatt (1958) [20], who solved it in the reversible case 
[20, Theorem 1]. Kemeny and Snell [54, Theorems 6.4.8 and 6.3.2] gave another 
exposition and introduced the "lumpability" terminology. Kemeny and Snell de- 
fined a (not necessarily stationary) finite-state Markov process X to be lumpable 
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with respect to a partition of its states if for every initial distribution for X the 
corresponding quotient process is Markov. They defined X to be weakly lumpable 
with respect to the partition if there exists an initial distribution for X for which 
the quotient process Y is Markov. In all of this, by Markov they mean 1-step 
Markov. Various problems around these ideas were (and continue to be) explored 
and solved. For now we restrict our attention to the question of the title of this 
subsection and describe three answers. 

5.1.1. Stochastic module answer. 

Theorem 5.1. Let (I, M, r) be a presentation of the reduced stochastic module of a 
sofic measure v on Y, in which Mi denotes the matrix by which a symbol i of A(Y) 
acts on the module. Suppose k £ N. Then the sofic measure v is k-step Markov if 
and only if every product Mjm • ■ ■ M^) of length k has rank at most 1. 

The case k = 1 of Theorem 5.1 was proved by Heller [48, Prop. 3. 2] An equivalent 
characterization was given a good deal later, evidently without awareness of Heller's 
work, by Bosch [15], who worked from the papers of Gilbert [40] and Dharmadhikari 
[23]. The case of general k in Theorem 5.1 was proved by Holland [50, Theorem 4], 
following Heller. 

5.1.2. Linear algebra answer. One can approach the problem of deciding whether a 
sofic measure is Markov with straight linear algebra. There is a large literature using 
such ideas in the context of automata, control theory and the "lumpability" strand 
of literature emanating from Kemeny and Snell (see e.g. [41] and its references). 
Propositions 5.2 and 5.3 and Theorem 5.4 are taken from Gurvits and Ledoux [41]. 
As with previous references, we are considering only a fragment of this one. 

Let N be the size of the alphabet of the irreducible shift of finite type Ha- Let 
7r be a 1-block code mapping ft a onto a subshift Y. Let P be an N x N irreducible 
stochastic matrix defining a 1-step Markov measure (i on Qa ■ Let p be the positive 
stochastic row fixed vector of P. Let U be the matrix such that U(i,j) = 1 if 7T 
maps the state i to the state j, and U(i,j) = otherwise. Given i € A(£Ia), let i 
be its image symbol in Y. Given j e A(Y), let Pj be the matrix of size P which 
equals P in columns i such that i = j, and is zero in other entries. Likewise define 
Pj . Given a Y"-word w = ji • ■ ■ jk , we let P w ~ Pj 1 ■ ■ ■ Pj k . 

Alert: We are using parenthetical notation for matrix and vector entries and 
subscripts for lists. If 7r/i is a 1-step Markov measure on Y , then it is defined using 
a stochastic row vector q and stochastic matrix Q. The vector q can only be pU, 
and the entries of Q are determined by Q(j,k) = (pjPkU)(k)/q(j). Let v denote 
the Markov measure defined using q, Q. Define qj,Qj by replacing entries of q,Q 
with zero in columns not indexed by j. For a word w = jo., .jk on symbols from 
A(Y), we have (nfi)(Co(w)) — v{Cn(w)) if and only if 

(5.i) r, •••/',/ ,,,.10, •••(/, 

(since qj n = pj a U). Thus nfi = v if and only if (5.1) holds for all Y"-words w. This 
remark is already more or less in Kemeny and Snell [54, Theorem 6.4.1]. 
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For the additional argument which produces a finite procedure, we define certain 
vector spaces (an idea already in [31, 56, 86, 87, 41] and elsewhere). 

Let Vfc denote the real vector space generated by the row vectors Pj Q Pj 1 " ' • Pj k 
such that joji ■ ■ ■ jt is a Y-word and < t < k. So, Vo is the vector space generated 
by the vectors pj , and Vfe+i is the subspace generated by Vk U {vPj : v G Vk,j G 
A(Y)}. In fact, for k > 0, we claim that 

(5.2) V k = ( { Pj0 P h ■ ■ ■ P jk : jo • • • jh € A(Y) k+1 } ) , and 

(5.3) V k+ i = ( {vP 3 : v G V fc> j G A(Y)} ) , 

where ( ) is used to denote span. Clearly (5.3) follows from (5.2), which is a 
consequence of stationarity, as follows. Because J2jPj = P = pP = J2jPPji an ^ 
for i ^ j the vectors pi and pPj cannot both be nonzero in any coordinate, we 
Pj = pPj. So, given t and j\ ■ ■ ■ jt, we have 

Pji P32 ' ' ' Pjt = pPji P]2 ' ' ' Pjt 

= ^] Pjo Pjl Pji ' ' ' Pjt ! 

JO 

from which (5.3) easily follows. Let V = (Ufc>oVfc). 

Proposition 5.2. Suppose P is an N x N irreducible stochastic matrix and is a 
1-block code. Let the vector spaces Vk be defined as above, and let n be the smallest 
positive integer such that V n — V n +i- Then n < N — V n = V, and the 

following are equivalent: 

(1) <pn is a 1-step Markov measure on the image of 4>. 

(2) p, /> •••/', i: Pjo UQ h ■ ■ ■ Q jn , for all j ■ ■ ■ j n G ^(F)" +1 . 

Proof. For k > 1, we have Vk C Vk+i, and also 

(5.4) V fe = V fc+ i implies V k = V t = V for all i > k . 
Because dim(V ) = \A(Y)\, it follows that n < N - \ A(Y)\. 

Because (1) is equivalent to (5.1) holding for all F-words joji ' • ■ jk, k > 0, we 
have that (1) implies (2). 

Now suppose (2) holds. For K > 1, the linear condition (5.1) holds for all 
Y- words of length k less than or equal to K if and only if vUQj = vPjU for 
all j in A(Y) and all v in Vk- (U is the matrix defined above.) Because Vk = 
V n for K > n, we conclude from (2) and (5.2) that (5.1) holds for all Y-words 
j(0)j(!) • ' • k > °> and therefore (1) holds. □ 

Next we consider an irrreduciblc N x N matrix P defining a 1-step Markov 
measure n on Q,a and a 1-block code <f> from CI a onto a subshift Y. Given a positive 
integer k > 1, we are interested in understanding when <pp, is fc-step Markov. We 
use notations U, p, Pj , Pj , Vt and V n = V as above. Define a stochastic row vector 
q indexed by Y-words of length k, with q(j ■ ■ ■ j k -i) = (p ]0 Pji • ' • p j k -i U )Uk~i)- 
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Let Q be the square matrix indexed by Y- words of length fc whose nonzero entries 
are defined by 



Q(jO ■ ■■jk-X,h ■ ■ -jk) - 

quo ■ ■•jk-i) 

Then Q is an irreducible stochastic matrix and q is a positive stochastic vector such 
that qQ = q. Let v be the fc-step Markov measure defined on Y by (q,Q). The 
measures v and <f>n agree on cylinders CoOo • " jk) an( l therefore on all cylinders 
Co{jo ' ' " jt) with < t < fc. Clearly, if (j}fx is fc-step Markov then <j)/j, must equal v. 

Proposition 5.3. [41] Suppose P is an NxN irreducible stochastic matrix defining 
a 1-step Markov measure /i on Qa and 4> '■ &a —lYisa 1-block code. Let k be a 
fixed positive integer. With the notations above, the following are equivalent. 

(1) <\>\i is a k-step Markov measure (i.e., (f>fi = v). 

(2) For every Y -word w = wq ■ ■ • Wk-i of length k and every v G V, 

(5.5) vP w {PU-lQ w ) = 0, 

where P w = P Wo ■ ■ ■ P Wk _ 1 ; 1 is the size N column vector with every entry 
1; and Q w is the stochastic row vector defined by 

(5.6) Q w (j) = Q(w a ■ ■■w k - 1 ,w- L ■ --Wk-ij) , jeA(Y). 



Proof. We continue to denote by z(j) the entry in the j'th coordinate of a row 
vector z. By construction of v we have for t = that 

(5.7) (7TM)C (io • • • jt+k) = vC {jQ ■ ■ ■ jt+k) for all j ■ ■ ■ j t+k e A t+k+1 . 

Now suppose t is a nonnegative integer and (5.7) holds for t. Given jo ■ ■ - jt+k, let 
w be its terminal word of length k. Then for j e A(Y), 

(Tr^)Co(jo • --jt+kj) - vC (j ■ ■■jt+kj) 

= (p j0 P h ■ ■ ■ P jt+k PjU) (j) - (vCo(j ■ ■■jt+k)Q w ) (j) 

= (pioPh ■ ■ ■ ''j . '',' ) U) - {(PioPh ■ ■ ■ Pj t+k i)Q w ) U) 

= (pj Ph---Pj t+k [PjU-lQ w ])(j) 

= (phPh---Pj t P4PU-iQ w ])(j), 



where the term Pj 1 ■ ■ ■ Pj t is included only if t > 0, and the last equality holds 
because the jth columns of PU and PjU are equal. Thus, given (5.7) for t, by (5.2) 
we have (5.7) for t + 1 if and only vP w [PU - 1Q W ] = for all v 6 V t and all w of 
length fc. It follows from induction that (5.7) holds for all t > (i.e. 7r^ = v) if 
and only if (5.5) holds for all v € V. □ 



Because V can be computed, Proposition 5.3 gives an algorithm, given fc, for 
determining whether the image of a 1-step Markov measure is a fc-step Markov 
measure. The next result gives a criterion which does not require computation of 
the matrix Q. 
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Theorem 5.4. [41] Let notations be as in Proposition 5.3. Then <f>[i is a k-step 
Markov measure on Y if and only if for every Y -word w of length k, 

(5.8) (iVP w ) n ker(£y)) P C kcr(C/) . 

Proof. Let w — wo ■ ■ ■ Wk—i be a Y"-word of length k. Using the computations of 
the proof of Proposition 5.3, we obtain for j e A(Y) that 

= 7T/iCo(w • • ' Wk-lj) - VC (W ■ ■ ■ Wk-lj) 

= (p Wa P Wl ---P Wk _APU -IQ W ])(]) 
= (pP W0 P wl ■ ■ ■ P Wk ^ [PU - 1Q W }) U) 
= (pP w [PU-lQ w ])(j) . 
Consequently, the vector v = p satisfies (5.5). Moreover, 

(pP w U)(wk-i) = {p Wo P Wl ■ ■ ■ Pw k _ 1 U)(w k -i) = 7TfiCo(w) > 0, 

and therefore pP w £ ker(/7). Because vP w = if and only if vP w l — 0, the space 
VP W is spanned by pP w and (VP W ) PI ker(J7). Thus (5.5) holds for all v G V if and 
only if (5.5) holds for all v £ V such that vP w £ ker([/), which is equivalent to 

(5.8) . □ 

Gurvits and Ledoux [41, Sec. 2.2.2] explain how Theorem 5.4 can be used to 
produce an algorithm, polynomial in the number N of states, for deciding whether 
7T/U is a 1-stcp Markov measure. 

5.2. Orders of Markov measures under codes. This section includes items 
relevant to the second part of Problem 4.33. 

Definition 5.5. Given positive integers m,n, k with 1 < k < n, recursively define 
integers N(k,m,n) by setting 

(5.9) N(n,m,n) = l 

(5.10) N(k,m,n) = (1 + m N ^ k+1 ' m ^)N(k + l,m,n) , if 1 < k < n . 

Proposition 5.6. Suppose n : Qa — > Y is a 1-block code and fi is a 1-step Markov 
measure on D,a- Let n be the dimension of the reduced stochastic module of ir/i 
and let m = \A(Y)\. Suppose n > 2. (In the case n = 1, nfj, is Bernoulli.) Let 
K = N(2,m,n). IfTT/J, is not K-step Markov, then it is not k-step Markov for any 
k. 

Before proving Proposition 5.6, we state our main interest in it. 

Corollary 5.7. Suppose fi is a 1-step Markov measure on an irreducible SFT 
Oyi determined by a stochastic matrix P , and that there are algorithms for doing 
arithmetic in the field generated by the entries of P. Suppose <f> is a block code on 
VLa ■ Then there is an algorithm for deciding whether the measure <f>fj, is Markov. 
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Proof. The corollary is an easy consequence of Propositions 5.2 and 5.6. □ 

The proof of Proposition 5.6 uses two lemmas. 

Lemma 5.8. Suppose Pi, . . . , P t are n x n matrices such that rank(Pi...P t Pi) = 
rank(Pi) = r. Then for all positive integers m, rank(Pi...P t ) m Pi = r. 

Proof. It follows from the rank equality that (Pi...Pfc) defines an isomorphism from 
the image of Pi (a vector space of column vectors) to itself. □ 

Lemma 5.9. Suppose k, m, n are positive integers and 1 < k < n. Suppose Q is 
a collection of m matrices of size n x n, and there exists a product of N(k, m, n) 
matrices from Q with rank at least k. Then there are arbitrarily long products of 
matrices from Q with rank at least k. 

Proof. We prove the proposition by induction on k, for k decreasing from n. The 
case k = n is clear. Suppose now 1 < k < n and the lemma holds for k+ 1. Suppose 
a matrix M is a product Qi(\) ■ • ■ Qi(N(k,m.n)) of N(k, m, n) matrices from Q and 
has rank at least k. We must show there are arbitrarily long products from Q with 
rank at least k. 

The given product is a concatenation of products of length N(k + l,m, n), and 
we define corresponding matrices, 

(5.11) Pj = Ql+(j-l)(N(k+l,m,n)) 1 ' ' Qj(N(k+l,m,n)) i 1 < J < 1 + m N{ - k+1 ' m ^ . 

If any Pj has rank at least k + 1. then by the induction hypothesis there are 
arbitrarily long products with rank at least k + 1, and we are done. So. suppose 
every Pj has rank at most k. Because rank(Pj) > rank(Af ) > fc, it follows that M, 
and every Pj, and every subproduct of consecutive Pj's, has rank k. 

There arc only m N ( k + 1 > m > n ) words of length N(k + l,m,n) on m symbols, so 
two of the matrices Pj must be equal. The conclusion now follows from Lemma 
5.8. □ 



Proof of Proposition 5.6. As described in Examples 4.37 and 4.5, there are 
algorithms for producing the reduced stochastic module for as a set of matrices 
M a (one for each symbol from A(Y)) and a pair of vectors u,v such that for 
any K-word a\ ■ ■ ■ at, (irfi)Co(ai ■ ■ ■ at) = uM ai ■ ■ ■ M at v. By Theorem 5.1, tt/j, is 
k-step Markov if and only every product M ai ■ ■ ■ M ak has rank at most 1 . Let 
K = N(2,m,n). If 7r/i is not A'-stcp Markov, then some matrix J]^ M„(i) has 
rank at least 2, and by Lemma 5.9 there are then arbitrarily long products of M a 's 
with rank at least 2. By Theorem 5.1, this shows that irfi is not A:-step Markov for 
any k. □ 

Remark 5.10. Given m and n, the numbers N(k,m,n) grow very rapidly as k 
decreases. Consequently, the bound K in Proposition 5.6 (and consequently the 
algorithm of Corollary 5.7) is not practical. However, in an analogous case (Problem 
5.13 below) we don't even know the existence of an algorithm. 
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Problem 5.11. Find a reasonable bound K for Proposition 5.6. 

Example 5.12. This is an example to show that the cardinality of the domain 
alphabet cannot be used as the bound K in Proposition 5.6. Given n > 1 in N, 
let A be the adjacency matrix of the directed graph Q which is the union of two 
cycles, a\b\bi ■ ■ ■ b n +iai and 0263^4 • ■ • bn+^a^- The vertex set {a\, ai, bi, ■ ■ ■ , &71+4} 
is the alphabet A of Ha- Let <f> be the 1-block code defined by erasing subscripts, 
and let Y be the subshift which is the image of <fi, with alphabet {a, b}. Let fi 
be any 1-step Markov measure on £Ia- In Q, there are exactly four first return 
paths from {a\, 02} to {ai, 0,2}: ai&i • • • b n+ ^ai, cti&i • ■ ■ b n+ ^ a i, 0-2^3 ■ • ■ b n +±a\ and 
^2^3 • • • ^n+302- Thus, in a point of Y, successive occurrences of the symbol a must 
correspondingly be separated by m 6's, with m 6 {n+4, rt + 3, n + 2, n + 1}. Each Y- 
word afo m a has a unique preimage word, so <p : £Ia — > Y is a topological conjugacy. 
Thus 4>n is fc-step Markov for some k. We have 

• • ■ 6„+3a 2 63 • • ■ b n+3 a 2 ) = (b n+3 ab n+1 )a , and 
0(oi6i • • • 6„ +4 ai6 1 • • • b n+1 ) = ab(b n+3 ab n+1 ) . 

So, (b n+3 ab n+1 )a and a5(6 n+3 a6™ +1 ) arc y-words, but ab(b n+3 ab n+1 )a is not a 
K-word. Consequently, wc have conditional probabilities, 

Mvo = a I y- { 2n +5) ■ ■ ■ y-i = {b n+3 ab n+1 )} > , 

Mvo = a I y_ (an+r) ■••!/_! = a5(6" +3 a6 n+1 )] = , 

which shows that <j>n cannot be (2n + 5)-Markov. In contrast, \A\ = n + 6 < 2n + 5. 

With regard to the problem (3.3) of determining whether a given factor map is 
Markovian, the analogue of Proposition 5.6 is the following open problem. 

Problem 5.13. Find (or prove there does not exist) an algorithm for attaching 
to any 1-block code <f> from an irreducible shift of finite type a number N with 
the following property: if a 1-step Markov measure fx on the range of 4> has no 
preimage measure which is TV-step Markov, then \i has no preimage measure which 
is Markov. 

Remark 5.14. (The persistence of memory) Suppose (j> : ft a — » is a 1-block 
code from one irreducible 1-step SFT onto another. We collect some facts on how 
the memory of a Markov measure and a Markov image must or can be related. 

(1) The image of a 1-step Markov measure can be Markov but not 1-step 
Markov. (E.g. the standard map from the fc-block presentation to the 
1-block presentation takes the 1-step Markov measures onto the fc-step 
Markov measures.) 

(2) If cj) is finite-to-one and v is fc-step Markov on (1b, then there is a unique 
Markov measure fx on f2^ such that <f>Li = v, and fx is also fc-step Markov 
(Proposition 3.18). 

(3) If any 1-step Markov measure on f2e lifts to a fc-step Markov measure on 
Ha, then for every n, every n-step Markov measure on lifts to an (n+fc)- 
step Markov measure on Qa ■ (This follows from the explicit construction 
(3.2) and passage as needed to a higher block presentation.) 
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(4) If (f) is infinite- to-one then it can happen [18, Section 2] ("peculiar memory 
example" ) that every 1-step Markov measure on 12 b lifts to a 2-step Markov 
measure on Qa but not to a 1-step Markov measure, while every 1-step 
Markov on £Ia maps to a 2-step Markov measure on 12^. 



6. Resolving maps and Markovian maps 



In this section, fi^ denotes an irreducible 1-step shift of finite type defined by 
an irreducible matrix A. 



6.1. Resolving maps. In this section, tt : VLa Y is a 1-block code onto a 
subshift Y, with Y not necessarily a shift of finite type, unless specified. U denotes 
the 0,1, 1^4(12,4)1 x -4(F) matrix such that U(i,j) = 1 iff n(i) = j. Denote a 
symbol (7ra;)o by Xq. 

Definition 6.1. The factor map tt as above is right resolving if for all symbols 
i,i,k such that ik occurs in Y, there is at most one j such that ij occurs in VLa 
and j = k. In other words, for any diagram 



(6.1) 



there is at most one j such that 



(6.2) 



k 



-> k 



Definition 6.2. A factor map tt as above is right e-resolving if it satisfies the 
definition above, with "at most one" replaced by "at least one" . 



Reverse the roles of i and j above to define left resolving and left e-resolving. A 
map tt is resolving (e-resolving) if it is left or right resolving (e-resolving). 

Proposition 6.3. (1) If tt is resolving, then /i(f2^) = h(Y). 

(2) If Y = and h(p,A) = ^(^b)> then tt is e-resolving iff tt is resolving. 

(3) If tt is e-resolving, then Y is a 1-step shift of finite type, 12b- 

(4) If tt is e-resolving and fcgN, then every k-step Markov measure onY = Qb 
lifts to a k-step Markov measure on f2^- 



Proof. (1) This holds because a resolving map must be finite-to-one [66, 58]. 

(2) We argue as in [66, 58]. Suppose tt is right-resolving. This means precisely 
that AU < UB. If AU ^ UB, then it would be possible to increase some entry of A 
by one and have a resolving map onto J2s from some irreducible SFT Qc properly 
containing 0^. But now h(Qc) > M^a), while h($lc) = M^s) = M^a) because 
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the resolving maps respect entropy. This is a contradiction. The other direction 
holds by a similar argument. 

(3) This is an easy exercise [18]. 

(4) We consider k = 1 (the general case follows by passage to the higher block 
presentation). Suppose ir is right c-resolving. This means that AU >UB. Suppose 
Q is a stochastic matrix defining a 1-step Markov measure [i on £Ib- For each 
positive entry B(k,£) of B and i such that = k, let J(i,k,l) be the set of 
indices j such that A(i,j) > and Tr(j) = £. Now simply choose P to be any 
nonnegative matrix of size and zero/positive pattern matching A such that for each 
i,k,l, J2j£j(i k i) = Q{k,l). Then PU = UQ, and this guarantees that 
7t/j = v. The condition on the +/0 pattern guarantees that (i has full support on 
f^A- (The code n in Example 3.4 is right e- resolving, and (3.4) gives an example of 
this construction.) □ 



The resolving maps, and the maps which are topologically equivalent to them 
(the closing maps), form the only class of finite-to-one maps between nonconjugate 
irreducible shifts of finite type which we know how to construct in significant gener- 
ality [5, 6, 66, 58, 17]. The e-resolving maps, and the maps topologically equivalent 
to them (the continuing maps), are similarly the Markovian maps we know how to 
construct in significant generality [18]. If Qa,Qb are mixing shifts of finite type 
with /i(Oa) > h(£ls) an d there exists any factor map from Qa to Qb (as there will 
given a trivially necessary condition), then there will exist infinitely many contin- 
uing (hence Markovian) factor maps from VIa to Qb- However, the most obvious 
hope, that the factor map send the maximal entropy measure of D,a to that of fls, 
can rarely be realized. Given fi^, there are only finitely many possible values of 
topological entropy for Qb for which such a map can exist [18]. 



6.2. All factor maps lift 1-1 a.e. to Markovian maps. Here "all factor maps" 
means "all factor maps between irreducible sofic subshifts" . Factor maps between 
irreducible SFTs need not be Markovian, but they are in the following strong sense 
close to being Markovian, even if the subshifts X and Y are only sofic. 

Theorem 6.4. [17] Suppose it : X — > Y is a factor map of irreducible sofic sub- 
shifts. Then there are irreducible SFT's f^f^s and a commuting diagram of factor 
maps 





n 


A 




B 


3) 


a 










X 


> Y 
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such that a, j3 are degree 1 right resolving and 7 is e-resolving. In particular, 7 is 
Markovian. If Y is SFT, then the composition f3j is also Markovian. 

The Markovian claims in Theorem 6.4 hold because finite-to-one maps are Mar- 
kovian (Proposition 3.18), e-resolving maps are Markovian (Proposition 6.3), and 
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a composition of Markovian maps is Markovian. In the case when tt is degree 1 be- 
tween irreducible SFTs, the "Putnam diagram" (6.3) is a special case of Putnam's 
work in [82], which was the stimulus for [17]. 

6.3. Every factor map between SFT's is hidden Markovian. A factor map 
7r : £Ia — ^ &b is Markovian if some (and therefore every) Markov measure on Sis 
lifts to a Markov measure on Qa- There exist factor maps between irreducible 
SFTs which are not Markovian. In this section we will show in contrast that all 
factor maps between irreducible SFTs (and more generally between irreducible sofic 
subshifts) are hidden Markovian: every sofic (i.e., hidden Markov) measure lifts to 
a sofic measure. The terms Markov measure and sofic measure continue to include 
the requirement of full topological support. 

Theorem 6.5. Let n : X — > Y be a factor map between irreducible sofic subshifts 
and suppose that v is a sofic measure on Y. Then v lifts to a sofic measure /i on 
X. Moreover, [i can be chosen to satisfy dcgrce(/i) < degree(f). 

Proof. We consider two cases. 

Case I: v is a Markov measure on Y. Consider the Putnam diagram (6.3) asso- 
ciated to 7r in Theorem 6.4. The measure v lifts to a Markov measure fi* on £Ia- 
Set \x = ctfj,*. Then tt(i = v, and dcgrcc(/i) = 1 < degree(iz). 

Case II: v is a degree n sofic measure on Y. (Possibly n = oo.) Then there 
are an irreducible SFT Clc with a Markov measure // and a degree n factor map 
g : Sic —> Y which sends fjf to v. By Lemma 6.8 below, there exist another 
irreducible SFT Sip and factor maps g and n with degree(g) < degree(g) such that 
the following diagram commutes: 

Sip — - — > Sic 

(6.4) g[ ]s 

X > Y 

7T 

Apply Case I to tt to get a degree 1 sofic measure v* on Hp which tt sends to /j,' . 
Then g(v*) is a sofic measure of degree at most n which tt sends to v. □ 

To complete the proof of Theorem 6.5 by proving Lemma 6.8, we must recall 
some background on magic words. Suppose X = SI a is SFT and 7r : Qa — > Y is a 1- 
block factor map. Any X-word v is mapped to a y-word irv of equal length. Given 
a y-word w = w[l,n] and an integer i in [l,n], set d{w,i) = \{w[ : irw' = w}\. As 
in [17], the resolving degree S(n) of 7r is defined as the minimum of d(w, i) over all 
allowed w,i, and w is a magic word for tt if for some i, d{w, i) = 5(ir). (For finite-to- 
one maps, these are the standard magic words of symbolic dynamics [66, 58]; some 
of their properties are still useful in the infinite-to-one case. The junior author 
confesses an error: [17, Theorem 7.1] is wrong. The resolving degree is not in 
general invariant under topological conjugacy, in contrast to the finite-to-one case.) 



52 



MIKE BOYLE AND KARL PETERSEN 



If a magic word has length 1, then it is a magic symbol. As remarked in [17, 
Lemma 2.4], the argument of [58, Proposition 4.3.2] still works in the infinite- 
to-one case to show that ir is topologically equivalent to a 1-block code from a 
one step irreducible SFT for which there is a magic symbol. (Factor maps tt, <j) 
are topologically equivalent if there exist topological conjugacies a, (3 such that 
ct4>f3 = n.) 

Proposition 6.6. Suppose X is SFT; tt : X — > Y is a 1-block factor map; a 
is a magic symbol for n; aQa is a Y-word; and a'Q'a" is an X-word such that 
Tc(a/Q'a") = aQa. Then the image of the cylinder Co[a'Q'a"] equals the cylinder 
Co[aQa]. 



Proof. Suppose PaQaR is a y-word, with preimage X-words P 3 a? Q 3 (a*) 3 R? , say 
1 < j < J, with the 1-block code acting by erasing * and superscripts. Because 
a is a magic symbol, there must exist some j such that Oj = a', and there must 
exist some k such that (a*) k = a" . Because X is a 1-step SFT, P- 'a' Q' a" R k is an 
X-word, and it maps to PaQaR. This shows that the image of Co[a'Q'a'] is dense 
in Co [aQa] and therefore, by compactness, equal to it. □ 

Corollary 6.7. Suppose n : X —tY is a factor map from an irreducible SFT X to 
a sofic subshift Y. Then there is a residual set of points in Y which lift to doubly 
transitive points in X . 



Proof. Without loss of generality, we assume ir is a 1-block factor map, X is a 1- 
step SFT, and there is a magic symbol a for n. Let v n = a' P n a! , n £ N, be a set of 
X- words such that every X-word occurs as a subset of some P„ and a' is a symbol 
sent to a. The set E n of points in X which see the words V\, v%, . . . v n both in the 
future and in the past is a dense open subset of X. It follows from Proposition 
6.6 that each irE n is open. For every n, E n contains E n +i, so Tr(n n E n ) = n n TrE n . 
Thus the set n n E n of doubly transitive points in X maps to a residual subset of 
Y. □ 



We do not know whether in Corollary 6.7 every doubly transitive point of Y 
must lift to a doubly transitive point of X . 

Lemma 6.8. Suppose a : X — > Z and j3 : Y — > Z are factor maps of irreducible 
sofic subshifts. Then there is an irreducible SFT W with factor maps a and f3 such 
that degree(/?) < degree(/3) and the following diagram commutes. 



W 



Y 



(6.5) 



X 



-> z 



Proof. First, suppose X and Y arc SFT. The intersection of any two residual sets in 
Z is nonempty, so by Corollary 6.7 we may find x and y, doubly transitive in X and 
Y respectively, such that ax = fjy. Let ftp be the irreducible component of the fiber 
product {(u,v) G X xY : ax = f3y} built from a and (3 to which the point (x, y) is 



SYMBOLIC DYNAMICS VIEWPOINT 



o3 



forward asymptotic, and let (3, a be restrictions to Of of the coordinate projections. 
These restrictions must be surjective. Note that degree(/3) < dcgrcc(/3). 

If X and Y are not necessarily SFT, then there are degree 1 factor maps from 
irreducible SFT's, p\ : Ha — > X and p2 : Qb —> Y , and we can apply the first case 
to find api and j3pi in the diagram with respect to the pair apx,fip2- Now for a 
and /3 we use the maps p\up~\ and p2^P2- D 
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